Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 102 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
102
Dung lượng
6,22 MB
Nội dung
UNDERSTANDING THE
FUNCTIONAL ROLES OF
INTRINSIC PROTEIN DISORDER IN
NFΚB TRANSCRIPTION FACTORS
LIM SHEN JEAN
B.Sc.(Hons.), NUS
A THESIS SUBMITTED
FOR THE DEGREE OF MASTER OF
SCIENCE
DEPARTMENT OF BIOCHEMISTRY
NATIONAL UNIVERSITY OF
SINGAPORE
2011
i
UNDERSTANDING THE FUNCTIONAL ROLES OF
INTRINSIC PROTEIN DISORDER IN NFΚB
TRANSCRIPTION FACTORS
LIM SHEN JEAN
NATIONAL UNIVERSITY OF SINGAPORE
2011
ii
Acknowledgements
I am grateful to my supervisor, Associate Professor Tan Tin Wee, for his guidance on
my research project. Next, I would like to thank Assistant Adjunct Professor Victor
Tong and Dr. Asif Khan (John Hopkins University) for their valuable ideas and
advice for my project. I am also very grateful for the IT assistance provided by Mark
de Silva and Lim Kuan Siong from the Life Sciences Institute. Finally, I would like to
express my appreciation to all my colleagues, as well as the administrative staff in the
Department of Biochemistry, National University of Singapore, for their strong
support during the course of my project.
iii
Summary
Protein dynamics, particularly, intrinsic protein disorder has been implicated in
cellular functions. Intrinsic protein disorder contributes to transcription and cell
signalling through the accommodation of multiple interaction partners and
modification sites, and provision of regulation flexibility. Here, in support with
previous studies, I hypothesize that analogous with sequence conservation of
functionally important sites, intrinsic protein disorder properties are evolutionary
conserved.
To further support and test this hypothesis, in the more specific context of
transcriptional regulation in cell signaling, I developed an in silico analysis pipeline
for the identification of intrinsically disordered protein residues, data mining and indepth analysis of the conservation, localization and function of predicted disordered
regions. The Nuclear Factor Kappa-light-chain-enhancer of Activated B cells
(NFκB/Rel), important for a variety of processes including cell survival, inflammation
and immunity, was chosen as the exemplar protein for this study.
The findings highlight distinctive key roles of conserved disordered and nondisordered in different aspects of NFκB function. Differences in the distribution and
conservation patterns of protein disorder in each NFκB protein type raise the
possibility of conserved disorder signatures in different protein families, which, if
true, will prove valuable for functional characterization.
On a larger scale, this project shows a meaningful perspective for the understanding
of protein function, through intrinsic protein disorder. The analysis pipeline developed
in this study will be instrumental for large-scale functional studies of protein families.
Findings from this project will also contribute to scientific knowledge in
transcriptional
regulation
and
cell
signaling.
iv
List of Tables
Table 1. Ranges of timescales and amplitudes where protein dynamics have been
reported to occur.
Table 2. Performance comparison of primary and meta-predictors for disorder
prediction at their respective optimum thresholds. The predictive performance of
MetaDisorder MD2 and P+F (DisBatch) is highlighted in bold.
ii
List of Figures
Figure 1. The two types of protein dynamics (or protein motions) and their
distribution, relative to protein structure.
Figure 2. A) Bar plot of mean accuracy values of primary and meta disorder
predictors at their respective optimum thresholds, with standard error estimates. B)
Boxplot of accuracy values of primary and meta disorder predictors at their respective
optimum thresholds. Each boxplot depicts the minimum accuracy value, lower
quartile, median, upper quartile, maximum accuracy value and any outlier
observation(s) for each predictor. The boxplot for MetaDisorder MD2 and P+F
(DisBatch) is highlighted in grey.
Figure 3. Sequence submission page of DisBatch. DisBatch is available at
http://bioslax01.bic.nus.edu.sg/meta/.
Figure 4 Output page of DisBatch. The page provides download links for each output
file, and a link to the help page at the bottom of the page.
Figure 5. Detailed sequence inclusion and exclusion criteria for records in NFκB
Base.
Figure 6. Number of records present in NFκB Base (Release: Beta 2.0) for each
NFκB protein type. NFκB Base is available at
http://proline.bic.nus.edu.sg/~shenjean/nfkb/.
Figure 7. A typical entry page of NFκB Base. Each entry contains information, where
available on source accession, NFκB protein type, description, organism, gene name,
chromosome name, sequence length, accession number(s) of duplicate record(s) and
cross-links to major online databases, including NCBI Protein (sequence database),
UniProt (sequence database), GO (Gene Ontology database), HGNC (gene
nomenclature database), InterPro (protein domain and family database), PDB (protein
iii
structure database), PubMed (literature database) and NCBI Taxonomy (taxonomy
database).
Figure 8. Sample keyword search output of NFκB Base, displaying the accession
number, source accession number, organism and description fields. NFκB Base
supports keyword searches in all or specific fields, where users can submit a query at
the top of every page, shown in the upper frame of this figure.
Figure 9. The Browse page of NFκB Base with jQuery supported dynamic data
search and display.
Figure 10. BLAST interface for NFκB Base.
Figure 11. Distribution of the average disorder score at each alignment position for
Class I NFκB proteins at the RHD domain of A) NFκB1, B) NFκB2 and C) Relish, as
predicted by DisBatch. The average disorder score cutoffs of 0.5 and 1.5 were used to
distinguish between moderately (predicted only by PrDOS to be disordered) and
highly disordered (predicted by both PrDOS and FoldIndex) residues, respectively.
Shannon’s entropy values were also plotted in the graph for comparison.
Figure 12. Distribution of the average disorder score at each alignment position for
Class II NFκB proteins at the RHD domain of A) RelA, B) RelB, C) C-Rel, D) Dorsal
and E) Dif, as predicted by DisBatch.
Figure 13. Distribution of the average disorder score at each alignment position for
Class I NFκB proteins at the IPT domain of A) NFκB1, B) NFκB2 and C) Relish, as
predicted by DisBatch.
Figure 14. Distribution of the average disorder score at each alignment position for
Class II NFκB proteins at the IPT domain of A) RelA, B) RelB, C) C-Rel, D) Dorsal
and E) Dif, as predicted by DisBatch.
iv
Figure 15. Distribution of the average disorder score at each alignment position for
Class I NFκB proteins at sites with no functional annotation in A) NFκB1, B) NFκB2
and C) Relish, as predicted by DisBatch.
Figure 16. Distribution of the average disorder score at each alignment position for
Class II NFκB proteins at sites with no functional annotation in A) RelA, B) RelB, C)
C-Rel, D) Dorsal and E) Dif, as predicted by DisBatch.
Figure 17. Distribution of the average disorder score at each alignment position for
Class I NFκB proteins at the ANK domain (in red) and Death domain (in black) of A)
NFκB1, B) NFκB2 and C) Relish, as predicted by DisBatch.
Figure 18. Scatter plot of average disorder score against the standard deviation of
disorder scores for Class I NFκB proteins, A) NFκB1, B) NFκB2 and C) Relish, as
predicted by DisBatch. The scatter plots show 2 distinct quadrants of: conserved nondisordered residues (bottom left) and conserved disordered residues (bottom right).
Functional domains and sites were annotated in the graph and coloured accordingly.
Figure 19. Scatter plot of average disorder score against the standard deviation of
disorder scores for Class II NFκB proteins, A) RelA, B) RelB and C) C-Rel, as
predicted by DisBatch.
Figure 20. (Cont’d from Figure 19) Scatter plot of average disorder score against the
standard deviation of average disorder score for Class II NFκB proteins, A) Dorsal, B)
Dif, as predicted by DisBatch.
Figure 21. Scatter plot of average disorder score against the CV of average disorder
score for Class I NFκB proteins, A) NFκB1, B) NFκB2 and C) Relish, as predicted by
DisBatch. The scatter plot shows 4 distinct quadrants of: non-conserved, nondisordered residues (top left of scatter plot), non-conserved disordered residues (top
right), conserved non-disordered residues (bottom left) and conserved disordered
residues (bottom right). Functional domains and sites were annotated in the graph and
coloured accordingly.
v
Figure 22. Scatter plot of average disorder score against the CV of average disorder
score for Class II NFκB proteins, A) RelA, B) RelB and C)C-Rel, as predicted by
DisBatch.
Figure 23. (Cont’d from Figure 22) Scatter plot of average disorder score against the
CV of average disorder score for Class II NFκB proteins, A) Dorsal, B) Dif, as
predicted by DisBatch.
Figure 24. Structures of representative Class I NFκB homodimers, NFκB1 (top) and
NFκB2 (bottom), coloured according to protein disorder annotations (left) and βfactors (right). The C-terminal IPT domain contains ankyrin protein binding sites
enveloping the dimerization interface. Ankyrin repeats and the Death domain were
not present in the 3D structures. The α-helical insert regions are conserved disordered
residues, highlighted in red, at the left of the protein structure in the N-terminal RHD
domain.
Figure 25. Structures of representative Class II NFκB homodimers, RelA (top) and
C-Rel (bottom), coloured according to protein disorder annotations (left) and β-factors
(right).
Figure 26. Structures of representative NFκB heterodimers formed between Class I
and Class II NFκB proteins, coloured according to protein disorder annotations (left)
and β-factors (right). Examples shown here are the RelA-NFκB1 (top) and RelBNFκB2 (bottom) heterodimers.
Figure 27. Structures of representative RelA homodimer (top) and RelA-NFκB1
heterodimer (bottom) in the IκB inhibited state, coloured according to protein disorder
annotations (left) and β-factors (right).
vi
List of Abbreviations
ADP - Adenosine Diphosphate
ATP – Adenosine Triphosphate
CASP - Critical Assessment of Techniques for Protein Structure Prediction
CD - Circular Dichroism
CD4 - Cluster of Differentiation 4
CGI – Common Gateway Interface
CSV – Comma Seperated Values
DisProt - Database of Protein Disorder
DSSP - Dictionary of Secondary Structure of Proteins
HIV - Human Immunodeficiency Virus
HTML - HyperText Markup Language
JAK - Janus kinase
LAMP – Linux Apache MySQL PERL/PHP/Python
MAPK - Mitogen-Activated Protein Kinase (MAPK)
NCBI - National Center for Biotechnology Information
NFkB - Nuclear Factor Kappa-light-chain-enhancer of activated B Cells
NMR - Nuclear Magnetic Resonance
P13K - Phosphatidylionsitol 3-Kinase
PDB – Protein Data Bank
PONDR - Predictor Of Natural Disordered Regions
PSSM – Position-Specific Scoring Matrix
RH Domain – Rel Homology domain
SD – Standard Deviation
STAT - Signal Transducer and Transcription Factors
SVM – Support Vector Machine
TAD – Transactivation Domain
RMSD - Root Mean Square Deviation
vii
Table of Contents
1
Introduction ....................................................................................................................... 1
1.1
Protein Dynamics ........................................................................................................... 1
1.2
Functional Significance of Protein Dynamics ................................................................. 2
1.2.1
1.3
Role of Protein Dynamics in Cell Signaling ................................................................. 3
Intrinsic Protein Disorder ............................................................................................... 4
1.3.1
Role of Intrinsic Protein Disorder in Cell Signaling..................................................... 5
1.3.2
Identification of intrinsic protein disorder................................................................. 5
1.3.2.1
Computational Tools for Intrinsic Protein Disorder Prediction ................................. 6
1.3.2.1.1
Ab-Initio Approaches.............................................................................................. 6
1.3.2.1.2
Template-based Approaches ................................................................................. 7
1.3.2.1.3
Meta Approaches ................................................................................................... 8
1.3.2.2
Benchmark Datasets for Intrinsic Protein Disorder Prediction .................................. 9
1.3.3
Functional Conservation of Intrinsic Protein Disorder .............................................. 9
1.4
2
Hypothesis.................................................................................................................... 10
Literature Review ............................................................................................................. 10
2.1
Transcription Factors ................................................................................................... 10
2.2
The NFkB Transcription Factor Family ......................................................................... 11
2.2.1
Mechanisms of Action of NFκB ................................................................................ 12
2.2.2
NFκB in Human Diseases .......................................................................................... 14
2.3
Computational analysis of NFκB proteins .................................................................... 15
2.3.1
Systems analysis of NFκB signaling machinery ........................................................ 15
2.3.2
Sequence Analysis of NFκB ...................................................................................... 16
2.3.2.1
Structural Analysis of NFκB ...................................................................................... 17
2.4
2.4.1
Protein Dynamics Analysis of NFκB.............................................................................. 18
Intrinsic Protein Disorder Analysis of NFκB ............................................................. 18
2.5
Limitations of reported studies.................................................................................... 18
2.6
Research Aims and Objectives ..................................................................................... 19
3 DisBatch: A Faster Meta-Prediction System for Large-Scale Identification of Intrinsically
Disordered Protein Regions ..................................................................................................... 21
3.1
Background .................................................................................................................. 21
viii
3.2
Materials and Methods................................................................................................ 22
3.2.1
Server Infrastructure ................................................................................................ 22
3.2.2
Primary Disorder Predictor Selection ...................................................................... 23
3.2.3
Meta-predictor Development .................................................................................. 23
3.2.4
Performance Evaluation........................................................................................... 24
3.2.5
Performance Measures............................................................................................ 25
3.2.6
Web Interface .......................................................................................................... 26
3.3
Results .......................................................................................................................... 26
3.3.1
Predictive Performance ........................................................................................... 26
3.3.2
Features ................................................................................................................... 29
3.4
Discussion..................................................................................................................... 31
3.4.1
Predictive Performance ........................................................................................... 31
3.4.2
Scoring Algorithm..................................................................................................... 32
3.4.3
Benchmark Model .................................................................................................... 32
3.4.4
Testing Dataset ........................................................................................................ 33
3.4.5
Software Limitation.................................................................................................. 34
3.5
Future Work ................................................................................................................. 34
3.6
Chapter Conclusion ...................................................................................................... 35
4
NFκB Base : A Specialized Database of NFκB Proteins ..................................................... 36
4.1
Background .................................................................................................................. 36
4.2
Materials and Methods................................................................................................ 37
4.2.1
Server Infrastructure ................................................................................................ 37
4.2.2
Sequence Data Collection ........................................................................................ 37
4.2.2.1
Inclusion and Exclusion Criteria ............................................................................... 37
4.2.3
Database Design....................................................................................................... 38
4.2.4
Web Interface .......................................................................................................... 39
4.2.5
Results ...................................................................................................................... 40
4.2.5.1
NFκB Base Content................................................................................................... 40
4.2.5.2
Features ................................................................................................................... 40
4.2.5.2.1
Keyword Search ................................................................................................... 40
4.2.5.2.2
Sequence Similarity Search .................................................................................. 43
4.2.5.2.3
Batch Download ................................................................................................... 43
4.2.6
Discussion................................................................................................................. 45
4.2.7
Future Work ............................................................................................................. 45
ix
4.2.7.1
Community Annotation Policy ................................................................................. 45
4.2.8
Chapter Conclusion .................................................................................................. 46
5
The Role of Conserved Disordered Residues in NFκB Function ....................................... 47
5.1
Background .................................................................................................................. 47
5.2
Materials and Methods................................................................................................ 48
5.2.1
Sequence Data Collection ........................................................................................ 48
5.2.2
Multiple Sequence Alignment.................................................................................. 48
5.2.3
Entropy Analysis ....................................................................................................... 49
5.2.4
Intrinsic Protein Disorder Analysis ........................................................................... 49
5.2.5
Conservation of Intrinsic Protein Disorder .............................................................. 49
5.2.6
Structural Analysis ................................................................................................... 50
5.3
Results .......................................................................................................................... 51
5.3.1
Conserved intrinsic protein disorder signatures in NFκB ........................................ 51
5.3.2
Structural Analysis ................................................................................................... 68
5.4
Discussion..................................................................................................................... 73
5.5
Future Work ................................................................................................................. 76
5.6
Chapter Conclusion ...................................................................................................... 77
6
Conclusion ........................................................................................................................ 79
7
References ....................................................................................................................... 80
x
1 Introduction
1.1 Protein Dynamics
Protein structures are dynamic in nature and undergo motion – a property that is an
integral part of their function[1-3].
Protein dynamics (or protein motion) occurs over a wide range of amplitudes and
timescales. For example, simple local internal motions, such as bond and angle
rotations, occur on a femto- to picosecond timescale[4]. Side-chain and loop motions
occur on a pico- to nanosecond time scale, while global external motions involving
large-scale conformational rearrangements occur on a micro- to millisecond
timescale[5,6]. Molecular interactions and binding occur on the second timescale
(Table 1)[2]. Additionally, complex, orchestrated protein motion, such as those
involving molecular motors has also been observed[3].
Table 1. Ranges of timescales and amplitudes where protein dynamics have been reported to occur.
Timescale
Femtosecond
Picosecond
Nanosecond
Microsecond
Millisecond
>1 second
Examples
Bond and angle vibrations
Side chain rotations
Hinge bending at domain interfaces
Helix-coil transitions
Protein folding, actin-myosin motion
Molecular interaction, binding
Amplitude
< 0.001 - 0.1 Å
0.1 - 1 Å
1 – 10 Å
10 Å - 100 Å
10 Å - 100 Å
10 - >100 Å
1
Figure 1. The two types of protein dynamics (or protein motions) and their distribution, relative to protein
structure.
Across timescales and amplitudes, protein dynamics can be broadly categorized into
internal and external motion[7]. Internal motion involves the deformation of protein
segment(s) such as bond, angle or side-chain rotations[7]. External motion, on the
other hand, encompasses the translational and rotational motions of protein
segment(s), such as hinge and shear motion, involving the protein backbone (Figure
1)[7,8].
Besides well-structured, ordered regions of proteins, protein dynamics have also been
studied in non-globular, unstructured and/or flexible regions (to be referred to as
intrinsically disordered regions)[9], where they contribute to a number of important
functions. Intrinsically disordered regions will be described in detail in Section 1.2.
1.2 Functional Significance of Protein Dynamics
Protein dynamics are fundamentally involved in important biological events, such as
protein folding, conformational changes and protein-protein interactions[2]. These
events are in turn vital to a large array of essential biological processes and
functions[1,3,6,10-12].
2
An example is the crucial role of protein dynamics in muscle contraction[6]. Muscle
contraction involves the cross-bridge cycle, with the first step involving adenosine
triphosphate (ATP) binding to the myosin head. Binding of the myosin head to actin
myofilaments, and calcium to the complex, leads to changes in electrostatic charges
and cross-bridge formation. Subsequent hydrolysis of ATP to adenosine triphosphate
(ADP) alters the conformation of the head of the cross-bridge and produces energy for
the pulling movement of the actin filament towards the centre of the cell. Finally, the
release of ADP disrupts binding with the actin filament and restarts the cycle with the
next ATP binding event, in the presence of calcium ions.
At a smaller scale, protein dynamics is also involved in human immunodeficiency
virus (HIV) infection[12]. This is mediated through the binding of the envelope
glycoprotein, gp120, to a c (CD4) receptor. Briefly, the binding event causes
conformational changes in gp120, in turn promoting the binding of HIV-1 to
chemokine receptors on the host cell, such as CCR5 or CXCR4. This activates the
gp41 protein and promotes the fusion of the HIV outer membrane with the host cell,
thereby permitting viral entry and infection.
1.2.1 Role of Protein Dynamics in Cell Signaling
An important process where protein dynamics plays an especially significant role is in
cell signaling[10,11]. Cell signaling involves specific recognition sites and strict
regulation of participating proteins to coordinate molecular interactions at intraand/or inter-pathway levels, ultimately resulting in combinatorial functional diversity.
The dynamics of vital signaling proteins, such as calmodulin, p53, BRCA1 and
MAP2, and their functional significance have been investigated[10,11,13-15]. Many
of these proteins partake in local internal motion via intrinsically disordered residues
3
that facilitate multiple molecular recognition mechanisms, interactions and
regulation[13-15].
1.3 Intrinsic Protein Disorder
Previous examples in Section 1.2 illustrate the functional role of protein dynamics in
protein segments or regions with stable, localized structures. Conventional ideas,
based on the “lock-and-key” model, highlighted the functional importance of stable,
localized structures. However, there has been increasing evidence that non-globular
domains with unstable and flexible structures, termed intrinsically (or natively)
disordered proteins or protein regions, are also important for function[9,16,17].
Intrinsically disordered proteins lead to poor protein expression and therefore pose
difficulties in protein purification and crystallization, hindering high throughput
structural determination[18].
Functional sites, mainly short linear motifs such as sorting signals, targeting signals,
protein ligands and post-translational modification sites, have been observed in
intrinsically disordered proteins and regions[18]. To date, many intrinsically
disordered proteins and protein regions have been reported[19,20]. These proteins and
regions have been discovered to be either completely or largely disordered, becoming
structured only in their bound states (e.g. CREB-CBP complex [21]) or in the
presence of changes in the biochemical environment [19,20]. Intrinsically disordered
proteins and protein regions have been reported to engage multiple binding partners
and are involved in many biological events and pathways, especially during cell
signaling[14,15,22-24].
4
1.3.1 Role of Intrinsic Protein Disorder in Cell Signaling
In the context of cell signaling, intrinsically disordered proteins and regions have been
associated with many regulatory events. Intrinsic protein disorder confers various
functional advantages, which include the capability to i) accommodate more
interaction partners and modification sites, ii) provide flexibility in regulation with
multiple, relatively low affinity linear interaction sites, iii) provide regulation
specificity with fewer linear motif types and iv) provide large intermolecular
interfaces with smaller protein, genome and cell sizes[25].
For example, the recognition of DNA by disordered peptides has been shown to be
involved in the regulation of gene expression by transcription, epigenetic
modifications and gene silencing[26].
1.3.2 Identification of intrinsic protein disorder
Intrinsically disordered proteins and protein regions can be indirectly observed
experimentally, using X-ray crystallography, Nuclear Magnetic Resonance (NMR-),
Raman-,
Circular
Dichroism
(CD-)
spectroscopy
and
hydrodynamic
measurements[18]. These laboratory methods recognize different types of protein
disorder, giving rise to various definitions of intrinsic protein disorder, such as highly
flexible regions, regions lacking a secondary structure or regions lacking a welldefined tertiary structure[18,27].
Experimental methods for detecting intrinsic protein disorder are often hampered by
the lack of stable protein structures[27]. To overcome this limitation, various
computational tools have been developed for the prediction of intrinsically disordered
proteins and protein regions from primary protein sequences[27].
5
1.3.2.1
Computational
Tools
for
Intrinsic
Protein
Disorder
Prediction
Various definitions have been used to describe intrinsically disordered protein
regions[18]. Consequently, computational tools designed for the prediction of
intrinsic protein disorder utilize different approaches, based on different operational
definitions of intrinsic protein disorder[18]. They can be broadly classified into abinitio approaches, template-based approaches and meta approaches[28].
1.3.2.1.1 Ab-Initio Approaches
Ab-initio approaches utilize only sequence-derived information for disorder
prediction. They originated from early methods that detect low-complexity regions in
protein sequences, such as SEG[9],[29]. Wootton’s study on compositionally biased
regions in sequence databases illustrated the association between these regions and
non-globular domains[9]. However, these methods have been shown to produce
copious false hits, since the correlation between disordered regions and low sequence
complexity does not always hold true. More refined methods have since been
designed[30].
The earliest prediction system developed specifically for intrinsic protein disorder
prediction was the suite of PONDR® (Predictor Of Natural Disordered Regions)
neural network predictors, which identify intrinsically disordered regions based on
properties such as local amino acid composition, flexibility, hydropathy and
coordination number[31]. Subsequent examples include the FoldIndex software, in
which prediction is based on the average residue hydrophobicity and net charge[32].
IUPred is another tool in which intrinsic protein disorder is predicted through
6
estimates of the capability of amino acid residues to form stable, favourable contacts
based on pair-wise energy content[33]. IUPred adopted the underlying assumption
that in contrast to globular proteins, intrinsically disordered proteins are not capable
of forming a large number of stable, favourable interactions[33].
Some ab-initio methods derive secondary and/or tertiary structure information from
input protein sequences to check for the presence of loops or coils, which are
considered to be non-regular secondary structures. For example, GlobPlot[34]
calculates Russell/Linding propensities for input amino acid residues to be in regular
secondary structures (α -helices or ß-strands) and non-regular secondary structures,
defined by the Definition of Secondary Structure of Proteins (DSSP)[35],
respectively. On the other hand, DISOPRED2[36] and the DisEMBL REMARK465
predictors were trained on Protein Data Bank (PDB)[37] structural data[18] to
identify amino acid residues present in the sequence but missing in X-ray structures.
DisEMBL also predicts protein disorder by detecting “hot loops”, utilizing both
secondary and tertiary structure information derived from input sequences[18]. The
algorithm detects highly dynamic DSSP-defined loops/coils with high β-factors (C-α
temperature factors), according to the training set of PDB[37] structure data[18].
1.3.2.1.2 Template-based Approaches
Template-based approaches perform comparisons of input data with similar sequence
or structure data to determine intrinsic protein disorder. For example, PrDOS[38]
performs PSI-BLAST searches of query protein sequences against structural datasets
of homologous proteins to predict intrinsically disordered residues, in addition to its
support vector machine (SVM) algorithm trained on position-specific scoring
matrices (PSSM). DISOclust[39] performs template-based prediction by first
7
determining the per-residue error of the input protein sequence in multiple protein
fold recognition models, built from homologous templates, followed by analysis of
the conservation of per-residue error across these models.
1.3.2.1.3 Meta Approaches
Meta approaches are tools, termed meta-predictors, which combine the prediction
results of multiple prediction methods. The availability of primary intrinsic protein
disorder prediction tools has sparked increased research interest in meta-predictors,
which have demonstrated higher prediction accuracies than primary predictors.
An example of a meta-prediction system is Meta-Disorder (MD) predictor, which
integrates prediction results from orthogonal sources of information and explicit
predictions of secondary structure, solvent accessibility and other sequence properties,
as inputs to neural networks for model training[40]. Subsequently, MD selects the
optimum algorithm for disorder prediction[40]. GeneSilico Disorder MD2 is another
example of a high performance meta-predictor[41]. The genetic algorithm-based
system first combines and weighs the results of 15 primary predictors, based on
accuracy. Subsequently, it collects the best alignments from the 8-fold recognition
method and infers protein disorder from alignment gaps. Other meta-predictors
reported in the literature include metaPrDOS[42] and PONDR-FIT[43]. In support of
meta-prediction efforts, a metaserver, MeDor[44], has also been developed to
facilitate easy retrieval and visualization of results from primary disorder prediction
systems.
8
1.3.2.2
Benchmark
Datasets
for
Intrinsic
Protein
Disorder
Prediction
To provide further impetus for intrinsic protein disorder prediction, since 2002, the
worldwide Critical Assessment of Techniques for Protein Structure Prediction
(CASP) experiments introduced a new category for protein disorder prediction, using
blind benchmark datasets[45].
Intrinsic protein disorder prediction has also been facilitated by the availability of the
Database of Protein Disorder (DisProt) since 2005[46]. DisProt is a specialized
database containing sequences across multiple species annotated with experimentally
verified intrinsically disordered regions[46].
1.3.3 Functional Conservation of Intrinsic Protein Disorder
The functional importance of intrinsically disordered proteins and protein regions
raises the likelihood that intrinsically disordered protein residues are evolutionarily
conserved. This proposal is in line with studies demonstrating that protein dynamics
properties, such as protein backbone flexibility, protein side-chain dynamics and
protein vibrational dynamics, are conserved[47-50].
Conservation of protein disorder has been studied by Chen et al. who demonstrated
that intrinsically disordered regions are conserved in protein domains and
families[51]. Reports have also shown that evolutionary conservation and
maintenance of protein disorder is costly and therefore non-trivial and non-random,
further supporting its indispensable functional significance[26,52-54].
9
1.4 Hypothesis
In the context of cell signaling, the evidence outlined in previous sections implies that
cell
signaling
proteins
generally
possess
varying
degrees
of
protein
dynamics[10,11,22]. These dynamics modulate changes in binding affinity and
specificity, which is in turn responsible for generating downstream functional
diversity in signaling pathways. In addition, dynamic properties of proteins have been
found to be encoded in their primary sequences and conserved in protein domains and
families [10,29]. Nevertheless, to date, in-depth analysis on the correlation between
conservation of dynamic properties and sequence and functional conservation is
lacking in literature. In view of the importance of intrinsically disordered protein
regions in cell signaling, it is hypothesized that a case study on an exemplar cell
signaling protein homologous sequence family will bring useful insights to the
relationship between conservation of dynamic properties and sequence conservation.
For this project, I have selected the Nuclear Factor Kappa-light-chain-enhancer of
Activated B cells (NFκB/Rel), a transcription factor protein family important for a
variety of processes including cell survival, inflammation and immunity[55-57]. This
project is part of a larger study exploring the function and role of NFκB in cell
signaling and immunity.
2 Literature Review
2.1 Transcription Factors
Transcription factors are a group of cell signaling proteins primarily involved in
transcriptional regulation, one of the key events of cell signaling responsible for gene
regulation and downstream protein expression[57]. These proteins play a pivotal role
10
as ‘central signaling hubs’ that carry and control the flow of information in biological
pathways from receptors to DNA[13]. Transcription factors regulate a variety of
diverse cellular and organismal processes[57]. Their high binding specificities,
coupled with tight regulation, have enabled transcription factors to process a huge
diversity of signal information with remarkable precision[57]. To date, the intricate
mechanisms of transcriptional regulation machinery have not been fully elucidated.
2.2 The NFkB Transcription Factor Family
The NFκB (Nuclear Factor Kappa-light-chain-enhancer of activated B cells) or Rel
protein family consists of a group of ubiquitously expressed, highly inducible and
structurally-related eukaryotic transcription factors[58]. They are involved in a large
variety of cellular and organismal processes, including the cellular stress response,
cell proliferation and survival, apoptosis, inflammation and innate and adaptive
immunity[55-57,59-61]. All NFκB transcription factors are related by a highly
conserved NH2-terminal Rel homology (RH) domain, responsible for DNA binding
and dimerization[58]. These proteins can be divided into two functionally distinct
classes that are capable of heterodimerizing freely, based on their C-terminus
sequence[58].
There are five mammalian NFκB proteins: NFκB1(p50/p105), NFκB2 p52/p100),
RelA(p65), RelB and c-Rel[59. The Class I proteins, including NFκB1 (p50/p105),
NFκB2 (p52/p100) and Drosophila Relish, contain a number of ankyrin repeats with
trans-repression activity at their C-terminus[59]. Class I proteins possess strong DNA
binding activity but weak transcriptional activation potential and are generally not
activators of transcription, except when they form heterodimers with Class II
proteins[59. The Class II (Rel) proteins, including RelA(p65), RelB, c-Rel, v-Rel and
11
the Drosophila Dorsal and Dif proteins, in contrast, exhibit weak DNA binding
activity and are observed to contain a potent trans-activation domain at their Cterminus[59].
2.2.1 Mechanisms of Action of NFκB
NFκB proteins associate into homo- and hetero-dimers that bind to target 9-10 DNA
base pair κB sites[59. The p50-RelA heterodimer represents the prototypical NFκB
complex and is the major NFκB complex found in most cells. The subunit
composition of the NFκB complex affects its DNA binding site specificity,
subcellular localization, trans-activation potential and mode of regulation, therefore
leading to combinatorial diversity of the downstream responses[58,62,63].
NFκB complexes are regulated via several pathways that control its translocation
from the cytoplasm to the nucleus, in response to extracellular stimuli[61,64]. To date,
at least three major signaling pathways have been identified: the IκB kinase (IKK)dependent canonical pathway, the IKK-dependent non-canonical pathway, and the
IKK-independent p38-CK2 pathway[61,64]. The IKK-dependent canonical pathway
involves the regulation of NFκB dimers containing RelA or c-Rel, through association
with a family of inhibitors known as IκBs (inhibitors of κB), which includes p100,
p105, IκBα, IκBβ, IκBγ, IκBε, IκBΖ, Bcl-3 and the Drosophilia Cactus protein[65].
IκBs typically inhibit the interaction of NFκB with DNA by blocking the DNA
binding sites of NFκB transcription factors[65]. IκB-NFκB interactions are, in turn,
mediated by the IκB kinase (IKK), a complex composed of the catalytic IKKα and
IKKβ subunits, and a regulatory subunit known as IKKγ or NEMO[61,64]. The IKK
complex, upon activation, phosphorylates two specific serine residues located at the
NH2-regulatory domain of IκB, leading to IκB ubiquitination and proteosome12
mediated degradation[61,64]. NFκB dimers containing RelB and NFκB2 (p52/p100)
are activated through the IKK-dependent non-canonical pathway, where homodimeric
IKKα lacking the IKKγ (NEMO) subunit phosphorylates the C-terminal region of
p100[61,64]. This leads to the ubiquitination and degradation of the p100 IκB-like Cterminal sequences, which in turn releases and activates p52-RelB[61,64].
The IKK-independent p38-CK2 pathway is activated by UV and the hepatitis B virus
trans-acting factor PX. Upon UV stimulation, IκBα proteins have been found to be
phosphorylated by CK2, leading to ubiquitination and degradation[61,64].
Recent evidence has also suggested that regulation of the NFκB pathway may involve
other processes such as ubiquitination, acetylation, prolyl isomerization (in the case of
RelA and p50), as well as phosphorylation (in the case of c-Rel and RelA)[58,61,66].
Activation of the NFκB complex results in its export from the cytoplasm to the
nucleus. This is mediated by specific nuclear-importing signals present in the Rel
homology domain, which binds to κB sites in the regulatory regions of inducible
promoters for the activation of targeted gene expression[58,61,66]. Similar to other
rapid-acting primary transcription factors, such as STATs (signal transducer and
transcription factors), nuclear hormone receptors and c-Jun, NFκB transcription
factors can induce rapid changes in gene expression without the need for new protein
synthesis[58,61,66]. Promoter-bound NFκB activates target gene expression via the
assembly of enhanceosomes – large nucleoprotein complexes resulting from the
cooperative binding of regulatory elements, such as chromatin-remodeling proteins,
nuclear coactivators, kinases and histone acetylases[58,61,66].
13
2.2.2 NFκB in Human Diseases
NFκB transcription factors are involved in the upregulation of a variety of genes,
some of which are responsible for cell proliferation and cell survival[58,60]. Aberrant
inactivation of NFκB leads to increased susceptibility to apoptosis[60]. On the other
hand, aberrant activation of NFκB has frequently been observed in cancers, where it
stimulates the expression of gene clusters, including oncogenes, that promote cell
survival,
inflammation,
angiogenesis,
tumor
development,
progression
and
metastasis[67,68].
Activation of NFκB in cancer cells has been attributed to chronic stimulation of the
IKK pathway, as well as mutations in NFκB genes or its regulatory genes such as
IκB[67,68]. Potential cross-talk between IKK/NFκB and other major signaling
pathways, including the mitogen-activated protein kinase (MAPK), JAK/STAT (Janus
kinase/signal transducer and transcription factor), p53 and phosphatidylionsitol 3kinase (PI3K) pathways, which have been implicated in cancer, have also been
observed[67,68]. The involvement of NFκB-related pathways in cancers has led to
investigation of its use as potential biomarkers, as well as therapeutic targets[69,70].
In addition, NFκB proteins play an important role in both the innate and adaptive
immune response, by serving as a regulator of a variety of processes. This includes Tcell development, maturation and proliferation upon activation of T-cell receptors, Bcell development, survival, division and immunoglobulin expression, control of the
immune response and malignant transformation[56,60,71-75]. NFκB transcription
factors perform various immune-related regulatory activities and function via the
differential activation of NFκB complexes in response to a diverse spectrum of
signals[56,60,71-75]. These signals are propagated from receptors including the
antigen receptors, pattern-recognition receptors and receptors for members of TNF
14
and IL-1 cytokine families[56,60,71-75]. Consequently, misregulation of NFκB
signaling
machinery
in
the
immune
system
has
been
associated
with
immunodeficiency and inflammatory diseases[56,57,74]. Constitutive activation of
NFκB has been frequently observed in asthma, arthritis, renal inflammatory disease,
sepsis and many other diseases[56,57,74,76].
2.3 Computational analysis of NFκB proteins
Findings discussed in the previous sections were primarily gathered from experiments
using conventional laboratory techniques. To complement laboratory approaches,
computational approaches have also been utilized for experiments on NFκB proteins.
In silico methods, driven by technological advances leading to sophisticated
algorithms and the availability of experimental datasets, have sped up the acquisition
of meaningful information on NFκB proteins.
2.3.1 Systems analysis of NFκB signaling machinery
Systems biology, as an emerging field emphasizing “integrative” rather than
“reductionist” approaches, involves the inter-disciplinary study of interactions,
functions and behaviours of multi-component biological systems[77,78]. In this field,
complex data is integrated from various experimental platforms[77,78]. The field of
systems biology arises from the availability of large datasets from high throughput
microarray and genomic platforms, as well as advances in computational techniques,
which facilitate large-scale analysis of biological mechanisms, pathways and
networks[77,78]. To this end, computational biology has been identified as one of the
fundamental cornerstones of systems biology for the processing, interpretation and
manipulation of complex, large-scale multi-experimental datasets[77,78].
15
In the specific context of NFκB proteins, integrative systems biology approaches have
been used to identify and study their roles, as well as their downstream target genes,
in cellular pathways and networks[72,79-81]. These approaches yield useful insights
on the functions of NFκB proteins by utilizing tools, including computational
predictions, gene expression profiling, functional annotation from biological
databases and transcription factor binding site analysis, combined with experimental
validation via RNAi knockdown or other experiments[72,79-81].
Systems biology approaches complement conventional laboratory approaches for the
investigation of interactions between critical modules or components in cellular
pathways and networks. It has been established that genes and proteins do not
function in isolation, instead engaging in complex dynamic interactions to perform
their biological roles and functions[78,]. These interactions are in turn regulated by
mechanisms involving transcription factors, signaling pathways and networks. Whilst
conventional laboratory research has been instrumental for the identification of genes
and proteins critical for cellular processes such as NFκB transcriptional regulation,
systems biology approaches attempt to integrate data from various experimental
sources to obtain an all-encompassing view of how biological systems function as a
whole[72,79-81]. As the field of systems biology continues to grow and mature, more
exciting applications of large-scale, integrative approaches will contribute to and
reshape the landscape of knowledge discovery in NFκB research.
2.3.2 Sequence Analysis of NFκB
Besides research at the systems-level, large scale promoter sequence studies of NFκB
binding sites has also been conducted. Such experiments aim to identify and
characterize conserved NFκB binding sites within sets of gene promoters[83,84].
16
These computational analysis efforts have in turn led to the development of
transcription factor databases and sophisticated prediction algorithms for the
prediction of transcription factor binding sites (including κB sites)[85-88]. These have
proved useful in predicting the involvement of NFκB and its downstream target genes
in various biological pathways.
On-going bioinformatics sequence analyses, employing comparative genomics and
laboratory functional studies, have led to the identification of NFκB/Rel homologues
in various organisms since its discovery by Sen and Baltimore in 1986. To date,
functionally conserved homologues of mammalian NFκB have been identified in a
variety of simpler organisms, including Drosophilia melanogaster (fruit fly)[71,89],
Aedes aegypti (yellow fever mosquito)[90], Aedes gambiae (malaria vector)[90],
Pinctada
fucata
shrimp)[92,93],
(pearl
oyster)[91],
Cnidarians
(sea
Litopenaeus
anemones
and
vannamei
corals)[94]
(pacific
and
white
Porifera
(sponges)[59].
2.3.2.1
Structural Analysis of NFκB
Complementary to sequence analysis, structural analyses of NFκB proteins have also
been conducted via computational means. Following 3D structural determination of
NFκB complexes bound to DNA, experimental efforts have been channelled towards
elucidating the detailed binding mechanisms of NFκB complexes in relation to their
corresponding 3D structures [95-97]. Additionally, computational approaches
employing molecular modeling and simulations for the study of NFκB inhibitors[98],
κB DNA sites[99] and the evolution of DNA-binding and protein dimerization
domains[100] have been reported in the literature.
17
2.4 Protein Dynamics Analysis of NFκB
To date, only one protein dynamics study mentioning NFκB proteins is present in the
literature. The authors simulated the interaction between C-Rel and a 20-bp DNA
sequence and observed a unique and dynamic NFκB recognition site. The study was
focused on the dynamics of the DNA, rather than the dynamics of the C-Rel protein
during binding[99]. However, the effects of protein dynamics in cell signaling and
allosteric control have been studied and reviewed in general[10,11,15,48-50]
.
2.4.1 Intrinsic Protein Disorder Analysis of NFκB
No intrinsic protein disorder analysis focusing solely on NFκB has been recorded in
literature. Nevertheless, general research efforts using intrinsic protein disorder to
identify protein binding sites[101,102] and analyse the functions of chromatin
remodeling proteins have been recorded[22]. In the context of cell signaling, the
functional roles of intrinsic protein disorder in cytoplasmic signaling domains[22] and
in scaffold proteins, which integrate cell signaling pathways[15], have been reported.
The most relevant study of intrinsic protein disorder in transcription factors was
conducted by Wells et al., who analyzed p53’s intrinsically disordered N-terminal
trans-activation domain (TAD) using NMR spectroscopy and X-Ray studies[14].
2.5 Limitations of reported studies
Based on the literature review, there appears to be limited research on the effects of
dynamic regions, or more specifically, intrinsically disordered protein regions, on the
function of NFκB transcription factors.
Furthermore, general research efforts on NFκB are mostly focused on specific classes,
types or states of NFκB proteins. Thus, they seem to provide only isolated, contextual
18
views of the NFκB signaling machinery. Clearly, a general macroscopic overview of
the functional role of protein dynamics in NFκB proteins, across all known subclasses
and organisms, is lacking.
2.6 Research Aims and Objectives
In Section 1.4, I have proposed the hypothesis that dynamic properties of proteins,
particularly cell signaling proteins, may contribute to their function and thus may be
evolutionary conserved. For this thesis, using NFκB transcription factors as an
exemplar, my research aim was to computationally analyse the conservation of
protein dynamics in this protein family and the functional effects that result. In
Section 1.1, it was highlighted that protein dynamics typically occur at two levels –
movements of intrinsically disordered protein regions, as well as local internal and
global external motion occurring at larger amplitudes[7,9]. The primary focus of my
research was on protein dynamics occurring in intrinsically disordered protein
regions.
To systematically achieve my research aim, firstly, there was a need for the
development of an in silico tool for large-scale identification of intrinsically
disordered residues. Next, NFκB sequence and structure data had to be collected and
stored in an online database. Subsequently, residues predicted to be disordered in
NFκB protein sequences would be subjected to analyses of their conservation,
localization on 3D protein structures and potential biological functions.
Specific objectives have been laid out for each phase of the research project, as
follows:
-
To develop an efficient system for large-scale identification of intrinsically
disordered regions in proteins.
19
-
To collect high quality NFκB sequence and structure data
-
To develop a specialized database of NFκB protein sequences and structures
for the benefit of the research community
-
To implement the developed prediction system and relevant analysis tools to
analyse the conservation and functional roles of intrinsically disordered
protein residues in NFκB signaling machinery.
For my research project, an in silico approach was adopted since large-scale data
mining and analysis was an integral part of the project. In silico approaches speed up
these procedures to promote knowledge discovery and provide useful leads for
experimental validation.
The methodology and findings, discussed in the next chapters, will lay the foundation
for further research in the field of protein dynamics, as well as transcriptional
regulation and cell signaling, potentially leading to significant contributions to
research in cell signaling.
20
3 DisBatch: A Faster Meta-Prediction
System for Large-Scale Identification
of Intrinsically Disordered Protein
Regions
3.1 Background
The identification of intrinsically disordered protein regions facilitates high
throughput structural determination, since these relatively unstructured and flexible
regions are reported to hamper protein purification and crystallization[34].
Additionally, intrinsically disordered regions have been known to be important for
protein function, through roles such as the presentation of protein modification sites
and the modulation of flexibility and specificity in protein-protein interactions[26].
Evidence has shown the evolutionary conservation and maintenance of protein
disorder to be non-trivial and non-random, suggesting functional significance[26,5254].
Recently, computational methods, based on various sequence and structural features
in intrinsically disordered regions, have played an increasing role in the identification
of intrinsic protein disorder. In particular, meta-predictors that combine the results of
multiple primary prediction methods have been extensively applied due to higher
prediction accuracies[38]. Nevertheless, most meta-predictors reported are limited in
terms of availability and scalability. Many are slow, unavailable locally and impose
practical restrictions on the number of submissions by users, posing difficulties for
large-scale batch sequence predictions. For example, GeneSilico MetaDisorder
MD2[41], the best disorder prediction method in CASP8 & CASP9[45], utilizes 15
21
primary disorder predictors and takes an average of 3 days for the prediction of 1-5
protein sequences, with a limitation of 10 jobs per day. Furthermore, the software is
also not available for local use. These constraints greatly limit the ability of the
scientific community to perform large scale protein disorder analysis.
In view of these limitations, I have developed a lightweight disorder meta-predictor
designed for rapid fully automated large-scale disorder analysis from protein
sequences.
The
prediction
system,
named
DisBatch
(available
at
http://bioslax01.bic.nus.edu.sg/meta/), demonstrates comparable performance with
GeneSilico MetaDisorder MD2, but with more than 10x speedup. The DisBatch metapredictor is now available both as a web service and as a local software package.
3.2 Materials and Methods
3.2.1 Server Infrastructure
DisBatch was written using a combination of Bash, Perl and R scripts. The metaprediction software was developed and hosted in the BioSlax 7.5 live operating
system (http://www.bioslax.com), developed by the Bioinformatics Centre in the
National University of Singapore (NUS), based on the Slax (http://www.slax.org)
Slackware Linux base distribution. BioSlax contains a suite of bioinformatics tools
(known as modules), which can be booted from any PC using the computer’s
memory. The operating system also allows for easy addition of new modules
containing additional software, services and settings, which can similarly be loaded
and activated upon boot-up. The BioSlax server running DisBatch consists of a frontend web portal and a Cloud-based backend. The Cloud backend server runs the
BioSLAX virtual machine using a Citrix Xen® hypervisor.
22
3.2.2 Primary Disorder Predictor Selection
Primary disorder predictors were first selected based on their availability and
scalability. Chosen predictors were required to allow for either i) software download
for local use, or ii) if used remotely as a web service, unrestricted number of
submissions by each user per day. Selected predictors include i) DisEMBL
REMARK465[18], ii) FoldIndex[32] and iii) PrDOS[38]. Information on these
disorder predictors were discussed previously in Section 1.3.2.1.
3.2.3 Meta-predictor Development
The performance of each primary predictor was evaluated against Release 5.7 of the
DisProt dataset[46], which contains sequences annotated with experimentally verified
intrinsically disordered regions, to determine the optimum threshold with the highest
accuracy. The DisProt testing set was checked for the presence of NFκB records and
none were observed. 5 candidate meta-predictors were built from each possible
combination of primary predictors at their optimum thresholds where the accuracy is
highest.
Both DisEMBL REMARK 465[18] and PrDOS[38] predictors convert their results to
probability scores, therefore their outputs were combined by averaging or weighted
averaging. Weights for the meta-predictor integrating DisEMBL REMARK 465[18]
and PrDOS[38] were assigned according to the Matthews correlation coefficient
(MCC) values[103]. Accuracy values were not used for weighting since both tools
yield almost equal accuracy at their optimum thresholds.
FoldIndex[32] rearranged Uversky et al.’s fold boundary equation to calculate the
prediction score. In his study, the default window size of 51 was used for disorder
prediction[32]. According to the modified equation, positive FoldIndex[32] scores
23
indicate probable folded proteins or regions and negative FoldIndex scores indicate
likely disordered proteins or regions. Since FoldIndex[32] does not yield probability
scores, the original scores were converted to binary values at each position. Positive
FoldIndex[32] scores representing predicted folded residues were assigned a value of
0, while negative scores representing predicted disordered residues were assigned a
score of 1. Due to the difference in scoring system, the probability scores returned
from DisEMBL REMARK 465[18] and/or PrDOS[38] were combined with the
FoldIndex[32] output by simple addition for all relevant meta-predictors.
The optimum threshold of each meta-predictor yielding the highest accuracy was
determined. The best performing meta-predictor is the combination of FoldIndex[32]
and PrDOS[38], at the threshold of 1.5, with positive prediction by both tools
(FoldIndex[32] binary score of 1 and PrDOS[38] probability cutoff score of ≥ 0.5 for
predicted intrinsically disordered residues).
3.2.4 Performance Evaluation
Due to low prediction speed and submission restrictions on the MD2 server, only 286
out of 638 sequences from the DisProt[46] dataset were predicted successfully over a
period of 2 months. For fair comparison, the performance of each predictor was
compared against Gene Silico MetaDisorder MD2[41], the best disorder prediction
method in CASP9[105] , using this subset.
24
3.2.5 Performance Measures
Performance measures used were sensitivity (SE), specificity (SP), accuracy (ACC),
positive predictive value (PPV) and negative predictive value (NPV). These were
calculated based on the number of true positives (TP), true negatives (TN), false
positives (FP) and false negatives (FN). TP and TN denote the number of known
disordered amino acid residues and ordered residues predicted correctly, respectively.
FP represents ordered residues predicted to be disordered, while FN represents known
disordered residues predicted to be ordered.
SE = TP/(TP+FN), SP = TN/(TN+FP) represent the proportion of correctly predicted
disordered amino acid residues and ordered residues in each protein sequence
respectively. ACC = (TP+TN)/N, where N represents the total number of residues in
each protein sequence, is a measure of the proportion of all correctly predicted
residues (disordered and ordered) in each protein sequence. PPV = TP/(TP+FP)
indicates the proportion of positively predicted residues (TP + FP) that are correctly
predicted as disordered (TP), while NPV = TN/(TN+FN) indicates the proportion of
negatively predicted residues (TN + FN) that are correctly predicted as ordered (TN).
MCC measures the randomness of the prediction and is calculated as:
The MCC value ranges between -1 and 1: MCC = 1 for 100% agreement of the
prediction, MCC = 0 for completely random prediction and MCC = -1 for 100%
disagreement of the prediction. SE, SP, ACC, PPV, NPV and MCC for each sequence
in the testing set were calculated, summed and averaged over the total number of
sequences.
25
3.2.6 Web Interface
A Web interface was set up to facilitate online access to DisBatch (FoldIndex[32] +
PrDOS[38]) at http://bioslax01.bic.nus.edu.sg/meta. DisBatch accepts sequences in
FASTA format as input. Unix, Perl and R commands used in DisBatch are called
remotely from CGI scripts written in Bash, which in turn submit and retrieve
predictions from the FoldIndex and PrDOS servers. Due to limitations in
computational resources, a maximum of 50 sequences is allowed per submission. For
large-scale disorder predictions, users can download the DisBatch software for free.
3.3 Results
3.3.1 Predictive Performance
I have successfully developed DisBatch, a light-weight meta-predictor optimized
using two primary predictors – FoldIndex[32] and PrDOS[38], to automate largescale batch disorder predictions. DisBatch combines the prediction output of
FoldIndex[32] and PrDOS[38] by simple addition.
DisBatch gives the best accuracy value of 67.79% when the threshold is set to 1.5,
where there is an agreement of positive prediction from FoldIndex[32] (binary score:
1) and positive prediction from PrDOS[38] (probability score : ≥ 0.5). DisBatch
(67.79% accuracy) slightly outperforms all primary and meta-predictors selected and
tested in this study and is comparable to GeneSilico Metadisorder MD2’s[41]
accuracy of 69.21% (Table 1 and Figure 2). Standard error estimates in Figure 2
indicates that the performance improvement of DisBatch may not be significant.
Nevertheless, DisBatch performs predictions faster (with more than 10x speedup)
when compared to MD2[41]. The average prediction rate of DisBatch is 10 minutes
26
per sequence (dependant on PrDOS’[38] server load and prediction speed) while the
average prediction rate of MD2[41] is 3 days per 1-5 sequences.
Table 2. Performance comparison of primary and meta-predictors for disorder prediction at their respective
optimum thresholds. The predictive performance of MetaDisorder MD2 and P+F (DisBatch) is highlighted in bold.
Disorder Predictor
Primary Predictor
DisEMBL
PrDOS
FoldIndex
Meta-Predictor
GeneSilico MD2
P+D
P+D (MCC-weighted)
D+F
P+F (DisBatch)
P+D+F
Threshold
Accuracy
0.5
0.6
NA
66.04%
67.08%
64.75%
NA
0.5
0.5
1.4
1.5
1.9
69.21%
67.63%
67.69%
66.83%
67.79%
67.75%
27
B
Figure 2. A) Bar plot of mean accuracy values of primary and meta disorder predictors at their respective optimum
thresholds, with standard error estimates. B) Boxplot of accuracy values of primary and meta disorder predictors at
their respective optimum thresholds. Each boxplot depicts the minimum accuracy value, lower quartile, median,
upper quartile, maximum accuracy value and any outlier observation(s) for each predictor. The boxplot for
MetaDisorder MD2 and P+F (DisBatch) is highlighted in grey.
28
3.3.2 Features
The DisBatch web interface is intuitive and consists of a simple search page where
users can input a maximum number of 50 sequences in FASTA format (Figure 3).
Since the prediction speed of DisBatch is dependent on the server load of PrDOS[38],
users are able to set a timeout value, in terms of the number of minutes per sequence,
for time-efficient prediction. If PrDOS[38] results are not fetched after the timeout
value, DisBatch will only display prediction results from FoldIndex[32] in the output
page. To further support large-scale analysis, DisBatch recommends users to contact
PrDOS[38] directly if the server response is slow or the input dataset is large.
Users can check their prediction status by clicking on the “Check Prediction Status”
button after submitting their sequences. Upon successful prediction, DisBatch
generates a number of output files (Figure 4). Firstly, it provides raw prediction
outputs from FoldIndex[32] and PrDOS[38] in their original format. DisBatch also
returns its meta-prediction output score (the sum of the FoldIndex[32] binary score
and the PrDOS[38] probability score) at each residue position in the sequence, in
comma separated values (CSV) format. Lastly, the summed meta-prediction score of
DisBatch was converted to the number of votes at each position and formatted as
CSV. The minimum number of votes is 0 when FoldIndex[32] and PrDOS[38] return
a positive value and a probability score of less than 0.5, respectively, while the
maximum number of votes is 2 when both primary predictors agree on a positive
prediction of a residue being potentially intrinsically disordered. A help page is
provided, with prominent hyperlinks at the home page and output page, for further
explanation of the DisBatch output files.
29
Figure 3. Sequence submission page of DisBatch. DisBatch is available at http://bioslax01.bic.nus.edu.sg/meta/.
Figure 4 Output page of DisBatch. The page provides download links for each output file, and a link to the help
page at the bottom of the page.
30
The DisBatch web server only accepts a maximum number of 50 sequences per
submission, to avoid server overload. For larger-scale predictions, DisBatch is
available as a local Unix package, downloadable from the web interface. Besides the
installation files, detailed documentation with full installation and usage instructions
can
be
found
in
the
download
page
(http://bioslax01.bic.nus.edu.sg/meta/download.php).
3.4 Discussion
Availability and scalability poses severe limitations for most disorder meta-predictors
reported in literature, hindering their use in large-scale predictions from protein
sequences. In this study, it has been demonstrated that a relatively lightweight
predictor can be utilized for fast, automated, large-scale disorder predictions, with
comparable performance to highly accurate meta-disorder predictors. This is
important because quick, accurate predictors promote large-scale protein disorder
analysis of proteins and their functions. To date, such large-scale studies have not
been extensively reported in literature, in part due to restrictions set by current highperformance meta-predictors.
3.4.1 Predictive Performance
The results indicate that DisBatch produces slightly higher accuracy at its optimum
threshold, when compared to other primary and meta-predictors examined in this
study. The advantage of this slightly better predictive performance is significant with
large input datasets, which DisBatch is designed specifically to cater for.
Nevertheless, Gene Silico Metadisorder MD2[41] is still recommended for smallscale disorder predictions, since it yields higher accuracy. The results, however,
31
suggest that the distribution of scores amongst various predictors used in this study
occurs at a very wide range of accuracy and it is unclear whether low-scoring hits
correlate across the predictors.
3.4.2 Scoring Algorithm
Discretisation of FoldIndex scores and the addition of these scores to probability
scores returned by other primary predictors in DisBatch may have introduced
problems and artefacts that may affect the accuracy of the meta-predictor. As such,
the performance of DisBatch may be improved by seeking alternatives to
discretisation, such as converting the empirical distribution from FoldIndex of a large
number of sequences into a probability estimate.
3.4.3 Benchmark Model
The performance of DisBatch was benchmarked against the performance of
GeneSilico Meta Disorder MD2[41], claimed to be the best disorder prediction
method in CASP9. Also, GeneSilico Meta Disorder MD2 is available without
restrictions for large-scale disorder prediction, albeit at a slow speed. Other metapredictors considered as benchmark models are not scalable for large-scale prediction
and thus their predictive performance is not evaluated. For example, PONDR-FIT[43]
only allows for manual submission through the web page, while metaPrDOS[42]
restricts submission to less than 10 sequences per hour and recommends PrDOS for
large-scale predictions instead.
32
3.4.4 Testing Dataset
With regards to the accidental DisProt subset, comprising of 286 sequences, used for
testing the predictors, it is difficult to guarantee that the sequences in the dataset were
not used by any of the predictors as part of their training dataset. DisEMBL[18] and
PrDOS[38] use PDB[37] structural files as the positive and negative training set,
while FoldIndex[32] mines literature information for the positive training set
consisting of intrinsically unfolded proteins and data from PDB[37] as the negative
training set. Since the DisProt[46] dataset was compiled from published experimental
data, it may overlap with the training set used by the primary predictors selected in
this study, especially FoldIndex[32].
In addition, the DisProt testing set may not represent a genomic sample of disordered
protein sequences. Other possible sources of dataset bias have also been identified in
this study. Firstly, the representativeness of the accidental DisProt[46] subset has not
been evaluated to ensure that it has enough quality for performance evaluation
purposes. Therefore, the subset may be biased in terms of protein length and/or
protein family and species representation. Similarly, the ratio of ordered and
disordered residues in the dataset was not determined and therefore the proportion
may also be biased, hindering objective performance evaluation.
These problems can possibly be overcome by adopting an iterative approach to curate
more intrinsically disordered regions in proteins. In addition, performance evaluation
of disorder predictors can also be conducted on a larger set of data representative of a
complete complex genome, such as the metazoan genome.
33
3.4.5 Software Limitation
One major limitation of meta-predictors like DisBatch is their reliance on remote
primary prediction server(s). Since both local and online versions of DisBatch use
PrDOS[38], the speedup of DisBatch is largely dependent on PrDOS'[38] prediction
speed and this makes DisBatch vulnerable to the idiosyncrasies of PrDOS[38].
Similarly, FoldIndex[32] results are retrieved through connection with the FoldIndex
server, albeit at a significantly faster speed (≈ 3 seconds per sequence on average)
compared to PrDOS’[38] average of 10 minutes per sequence.
DisBatch returns results from FoldIndex[32] as an alternative if the PrDOS[38] server
is facing technical difficulties. Nevertheless, other meta-predictors demonstrated to
have comparable results with DisBatch in this study. In particular, the DisEMBL[18]
and FoldIndex[32] (D+F) meta-predictor yields 66.83% accuracy as compared to
DisBatch’s 67.79%. This meta-predictor can be considered as another alternative. One
advantage of this alternative is that DisEMBL[18], unlike PrDOS[38], can be
executed offline without any server connection.
3.5 Future Work
To address pitfalls pertaining to predictive performance, dataset and software,
highlighted in Section 3.4, rigorous investigation into the benchmark testing dataset is
necessary to ensure objective performance evaluation. More in-depth examination of
the testing dataset, as well as the training datasets of the primary predictors, should be
carried out in the future to lend support to the performance advantage of DisBatch.
Future work can include analyses on protein length, protein families and species
covered in the testing set, as well as the correlation of high and low-scoring hits, and
the proportion of ordered and disordered residues. Furthermore, blind CASP[105]
34
datasets not present in the training set of all primary predictors can also be used as the
testing set, to eliminate the problem of dataset bias.
More importantly, the prediction and scoring algorithm in DisBatch can be improved
to yield higher accuracy. As discussed, the scoring algorithm can be revised to include
only probability scores. On the other hand, innovative prediction algorithms can be
explored and incorporated into DisBatch.
Future improvements to the DisBatch web service are also necessary to improve
usability. The speedup of DisBatch compared to other meta-prediction tools available
can be quantified in terms of initial, transitional and terminal stage. The prediction
service can be configured to return results by e-mail for greater user convenience.
Also, interactive visualization and analysis tools such as plots and annotated sequence
views can be provided by the web server to further facilitate meaningful large-scale
analysis.
3.6 Chapter Conclusion
This study addresses the problem of using meta-predictors to predict intrinsically
disordered protein regions. High-performing meta-predictors like GeneSilico's
MetaDisorder MD2[41] are slow and pose access limitations. Hence, I propose an
alternative meta-predictor, DisBatch, which is much faster and has comparable
performance. DisBatch is available both as a web service and local software. Despite
the limitations raised in this chapter, the study represents a call for the development of
large-scale disorder predictors with a more balanced performance-to-time ratio. Such
powerful predictors will further drive research in intrinsic protein disorder and lend
crucial applications to the elucidation of the biological functions of intrinsically
disordered residues, which is the focus of my project.
35
4 NFκB Base : A Specialized Database
of NFκB Proteins
4.1 Background
NFκB transcription factors play a critical role in transcriptional activation and are
associated with a wide range of important cellular processes involving cell
proliferation and survival and the immune response[65,106-109]. In addition to
experimental and structural data, protein sequences of NFκB have important research
value for functional characterization of the protein family[83,90]. Numerous studies
have been performed with NFκB sequences, as outlined in Section 2.3.2.
To date, while a large number of NFκB protein sequences can be found in major
sequence databases, there is no specialized, publicly accessible database specifically
for NFκB protein sequences, though datasets containing NFκB target genes are
available online (http://people.bu.edu/gilmore/nf-kb/target/index.html). This presents
a need for a centralized repository containing an annotated dataset of NFκB protein
sequences to fill the gap in current resources on NFκB.
In this chapter, I present NFκB Base, a specialized database of experimentally
verified, manually curated NFκB protein sequences. The database is integrated with
analysis tools including (i) dynamic data display, (ii) keyword search and (iii) Basic
Local Alignment Search Tool (BLAST) sequence search. The main aim of NFκB
Base is to support and facilitate large-scale sequence and functional studies of NFκB
proteins. NFκB Base is available at http://bioslax01.bic.nus.edu.sg/nfkb/.
36
4.2 Materials and Methods
4.2.1 Server Infrastructure
Similar to DisBatch, NFκB Base was developed and hosted in the BioSlax 7.5 live
operating system (http://www.bioslax.com), using the Linux-Apache-MySQL-PHP
(LAMP) software stack.
4.2.2 Sequence Data Collection
Keyword and sequence similarity searches were performed against NCBI Protein
(GenBank Flat File Release 177.0, Release Date: April 15, 2010)[110], UniProt [52]
(Release 2010_06, published May 18, 2010)[111] and PDB (Release date: June 1,
2010)[37]. Sequence similarity searches were performed using the Basic Local
Alignment Search Tool (BLAST+) software, version 2.2.23[112]. Literature
information was also mined from sequence records.
4.2.2.1
Inclusion and Exclusion Criteria
All information was manually filtered and verified, according to keywords and
literature, to remove irrelevant and hypothetical records. Duplicates for each record
were also identified and recorded. The inclusion and exclusion criteria used during the
filtering and curation process for records in NFκB Base are documented in Figure 5.
37
Figure 5. Detailed sequence inclusion and exclusion criteria for records in NFκB Base.
4.2.3 Database Design
Collected and manually reviewed NFκB sequence data was stored in the MySQL
Relational Database Management System. Each entry is assigned a unique accession
number, beginning with the NFκB protein type and followed by a unique 5-digit serial
number. A typical entry contains annotated information, where available on (i) source
38
accession number, (ii) NFκB protein type, (iii) protein name and description, (iv)
scientific and common name of source organism, (v) gene name, (vi) chromosome
name, (vii) sequence length, (vii) sequence, (viii) accession number(s) of duplicate
record(s) and (ix) relevant cross-links to major databases. Cross-references link to the
NCBI Protein[110], UniProt[111], Gene Ontology (GO)[113], Hugo Gene
Nomenclature Committee (HGNC)[114], InterPro[115], PDB[37], PubMed[116] and
the NCBI Taxonomy databases[116], where additional sequence, function, gene
nomenclature, protein domain and family, structural, literature and taxonomy
information can be found, respectively. These links are included to further increase
NFκB information coverage.
4.2.4 Web Interface
The web interface for NFκB Base was constructed with HyperText Markup Language
(HTML), PHP, web BLAST, Perl CGI scripts and the jQuery libary. HTML was used
to present web page content, while PHP was used for database server connection for
search queries and entry display. The web BLAST Perl CGI was used to perform
online BLAST searches by calling the local BLAST package and BLAST databases.
The jQuery library, based on Asynchronous Javascript and XML (AJAX), allows for
dynamic table browsing and display. It is designed to present a quick and concise
view of the database, displaying only important fields, including NFκB and source
accession numbers with relevant hyperlinks to full entry records, organism name and
protein descriptions dynamically, for simple navigation of the database.
39
4.2.5 Results
4.2.5.1
NFκB Base Content
The latest release of NFκB Base (Beta 2.0) contains 413 records of experimentally
verified protein sequences within the eukaryotic NFκB family. There are 22 C-Rel
records, 41 Dorsal records, 29 Dif records, 70 NFκB1 records, 59 NFκB2 records, 95
RelA records, 19 RelB records and 71 Relish records (Figure 6). These records were
collected from major sequence and structure databases, including NCBI Protein[110],
UniProt[111] and PDB[37], and were subsequently filtered and reviewed manually.
Each record was assigned a unique accession number containing information on the
protein type and serial number. In addition, each record is linked to a detailed entry
page, where all annotated data is organized in fields (Figure 7).
4.2.5.2
Features
4.2.5.2.1 Keyword Search
NFκB Base supports keyword search, through the integration of a search query box on
the top of each page. Users can query the database based on specific fields, including
NFκB Base accession number, source database accession number, Gene Identifier
(GI) number[116], protein description, gene name and organism name. For more
general searches, users can also look up a keyword in all fields of the database. Search
results are displayed in tabular format, with basic information including NFκB Base
accession number, source database accession number, organism name and protein
description (Figure 8). The accession number fields are hyperlinked to the NFκB
Base entry page and source database entry page.
Alternatively, users can browse the database dynamically, in the same tabular view as
the search output page (Figure 9). Users can customize the number of entries to
40
display and perform a keyword search on these fields dynamically. Each displayed
field can also be sorted dynamically, in ascending or descending order.
Figure 6. Number of records present in NFκB Base (Release: Beta 2.0) for each NFκB protein type. NFκB Base is
available at http://proline.bic.nus.edu.sg/~shenjean/nfkb/.
Figure 7. A typical entry page of NFκB Base. Each entry contains information, where available on source
accession, NFκB protein type, description, organism, gene name, chromosome name, sequence length, accession
number(s) of duplicate record(s) and cross-links to major online databases, including NCBI Protein (sequence
database), UniProt (sequence database), GO (Gene Ontology database), HGNC (gene nomenclature database),
InterPro (protein domain and family database), PDB (protein structure database), PubMed (literature database) and
NCBI Taxonomy (taxonomy database).
41
Figure 8. Sample keyword search output of NFκB Base, displaying the accession number, source accession
number, organism and description fields. NFκB Base supports keyword searches in all or specific fields, where
users can submit a query at the top of every page, shown in the upper frame of this figure.
Figure 9. The Browse page of NFκB Base with jQuery supported dynamic data search and display.
42
4.2.5.2.2 Sequence Similarity Search
Besides keyword searches, NFκB Base also integrates the BLAST[112] tool for
sequence similarity searches against all or specific types of NFκB proteins (C-Rel,
Dorsal, Dif, NFκB1, NFκB2, RelA, RelB and Relish) recorded in the database.
BLAST[112] is a local sequence comparison tool that returns nucleotide or protein
sequences containing identical or similar regions with the input query sequence. All
BLAST types, including blastp, blastn, blastx, tblastn and tblastx are supported by
NFκB Base, allowing easy identification of matching or similar NFκB sequences to
the query sequence. The BLAST interface provides all standard configurable
parameters, similar to the original NCBI[116] BLAST interface.
4.2.5.2.3 Batch Download
NFκB Base provides batch download of all records or records specific to a particular
NFκB protein type stored in NFκB Base. Users can batch retrieve all annotations in
CSV format or sequence files in FASTA format.
43
Figure 10. BLAST interface for NFκB Base.
44
4.2.6 Discussion
NFkB Base represents the first specialised database containing annotated,
experimentally verified information pertaining to NFκB proteins. High quality data is
important for research, especially those utilizing computational approaches. With
increasing data publicly available in general and specialised databases, the laborious
procedures of data collection and annotation often form the rate-limiting step of datacentric research. NFkB Base aims to speed up and facilitate research on NFκB
transcription factors by providing readily accessible, high quality information on these
proteins. To further benefit the research community, NFκB Base is also part of the
BioDB100
initiative
(http://biodb100.apbionet.org)
aiming
to
support
the
reproducibility of scientific data through archival and re-instantiation
4.2.7 Future Work
As the quality of data lies in the core of databases like NFκB Base, future efforts
entail the compilation of additional information to be integrated in NFκB Base. The
database can be expanded to include relevant information such as conserved domain
and functional site data, as well as protein dynamics data such as annotations on either
experimentally verified or computationally predicted intrinsically protein disorder
residues. More interactive structure, sequence and phylogeny visualizations and
analysis tools can also be built into the NFκB Base web interface.
4.2.7.1
Community Annotation Policy
In addition, to speed up knowledge discovery and sharing, NFκB Base can adopt a
community annotation policy, similar to Allergen Atlas[117] and T3SEdb[118],
45
where users can submit new curated data and/or update existing NFκB information in
the database. The inclusion and exclusion criteria outlined in Figure 5 can be used as
a guide for the community annotation policy.
4.2.8 Chapter Conclusion
This study describes the development of a specialized database, NFκB Base,
containing specific annotated information on NFκB proteins, as well as integrated
analysis tools. The database contributes to research on the NFκB transcription
regulation machinery, through the sharing of curated information that leads to a better
understanding of the structure and function of these important proteins.
46
5 The Role of Conserved Disordered
Residues in NFκB Function
5.1 Background
Contrary to the conventional view that folded, structured proteins are important for
function, it has been discovered that intrinsically disordered proteins or protein
regions, which are more flexible than their counterparts, contribute to a variety of
cellular functions[17]. The functional roles of these proteins or protein regions have
been studied, particularly in the field of cell signaling[14,15]. They allow for the
accommodation of multiple interaction partners and modification sites, as well as
regulation flexibility[18]. Analogous with sequence conservation of functionally
important sites, the evolutionary conservation of intrinsic protein disorder has been
discovered to be non-trivial and non-random, further signifying its functional
importance[26,53,54].
Literature review of studies on intrinsic protein disorder, as described in Section 1.3,
gave rise to my hypothesis that intrinsic protein disorder properties of cell signaling
proteins are evolutionary conserved in protein families. Details on the formulated
hypothesis can be found in Section 1.4.
The hypothesis called for systematic protein sequence and disorder conservation
analyses on an exemplar cell signaling protein family for validation. To this end, the
Nuclear Factor Kappa-light-chain-enhancer of Activated B cells (NFκB/Rel) proteins,
crucial for processes such as cell proliferation, survival, inflammation and
immunity[55,58], were selected as the exemplar protein family. Currently, no
47
investigation on the functional role of intrinsic protein disorder in NFκB transcription
factors has been recorded. This study therefore addressed a specific gap in NFκB
literature.
In the final phase of my research project, I developed an in silico analysis pipeline for
the identification of intrinsically disordered protein residues and the analysis of the
conservation, localization and function of predicted disordered regions. The results of
the analysis revealed distinctive protein disorder distribution patterns across each
NFκB protein type, which are conserved across different species. This implies the
functional contribution of intrinsically disordered protein residues in promoting DNA
and ankyrin protein binding events.
5.2 Materials and Methods
5.2.1 Sequence Data Collection
NFκB protein sequences used for this study were collected from NFκB Base (Chapter
4), which contains 413 experimentally verified, manually annotated NFκB sequences.
To minimize redundancy, the collected sequences were further processed to select the
longest, unique representative sequence for each organism in each NFκB protein type.
A total of 18 NFκB1 representative sequences across multiple organisms were
analysed. These 18 sequences comprised 11 NFκB2 sequences, 14 C-Rel sequences, 6
Dif sequences, 16 Dorsal sequences, 19 RelA sequences, 5 RelB sequences and 24
RelB sequences.
5.2.2 Multiple Sequence Alignment
For each NFκB protein type, multiple sequence alignment was performed using
Multiple Sequence Comparison by Log-Expectation (MUSCLE)[119] hosted on
48
EUAsiaGrid[120]. The alignments were checked and edited manually for
misalignments in BioEdit[121]. Positions in the alignments where more than 50% of
the sequences contained a gap were removed to yield entropy and conservation
measurements with high statistical support.
5.2.3 Entropy Analysis
Based on the multiple sequence alignment, the level of amino acid residue
conservation at each position was inferred using Shannon’s entropy values calculated
using BioEdit[121]. The entropy value at each position provides a measure of
uncertainty at each position relative to other positions and is defined as H(x) = -∑f(b,
x)ln(f(b, x)), where f(b, x) is the frequency at which residue b is found at position
x[122]. The maximum entropy is calculated as Hmax = ln n = ln 20, where n
represents the maximum number of variations at a particular position[122]. High
entropy values correspond to positions in the alignment with high variability and thus
low residue conservation.
5.2.4 Intrinsic Protein Disorder Analysis
DisBatch (Version 0.02, Chapter 3) was used to predict potentially disordered
residues, with the threshold set at 0.5 (for positive prediction by PrDOS[38] only) and
1.5 (for positive prediction by both PrDOS[38] and FoldIndex[32]).
5.2.5 Conservation of Intrinsic Protein Disorder
Conservation of intrinsic protein disorder at each residue position in the multiple
sequence alignment, for each NFκB protein type across multiple species, was
measured first using the standard deviation (SD), followed by the coefficient of
49
variation (CV). While SD is unit-dependent, CV represents a normalized, scaleinvariant measure of the dispersion of the average disorder score around the mean.
CV is expressed as the ratio of the standard deviation to the mean: CVx= σ(µx) / µx
where σ(µx) represents the standard deviation of the mean DisBatch protein disorder
score, µx, at position x, across all sequences (from multiple orgnanisms) of a specific
NFκB type in the multiple sequence alignment. Low CV values correspond to residue
positions in the alignment where DisBatch protein disorder scores across all
sequences share low standard deviations in relation to the mean, thereby implying
conservation of the average disorder scores.
5.2.6 Structural Analysis
Structural data of NFκB proteins were mined from annotated sequence records and
extracted from the Protein Data Bank (release date: June 1, 2010)[37]. A total number
of 35 wild type and mutated protein NFκB structures in bound states from all
available species were collected and carefully reviewed. Protein sequences from
structural records were a subset of the NFκB sequence dataset in NFκB Base that are
used in analysis of intrinsic protein disorder properties. 16 representative, unique
structures for each NFκB dimer combination in either active or inhibited states were
selected for further analysis. The NFκB protein structures were superimposed and
annotated using PyMOL (version 0.99rc6)[123]. Root mean square deviation (RMSD)
values representing the average distance between superimposed atoms were also
calculated by PyMoL. β-factors for each atom, indicative of the degree of thermal
displacement, were extracted from PDB files and visualized graphically in
PyMOL[37]
.
50
5.3 Results
5.3.1 Conserved intrinsic protein disorder signatures in NFκB
To predict intrinsically disordered residues in both Class I and Class II NFκB
proteins, DisBatch was used. The average DisBatch score for each position in the
multiple sequence alignment was calculated and compared with Shannon’s entropy
values[122] to check for any correlation between intrinsic protein disorder and residue
conservation. High entropy values represent positions in the alignment with high
variability, implying low residue conservation.
NFκB proteins share a conserved N-terminal Rel homology (RH) domain, which is
further subdivided into the N-terminal specificity sub-domain and the C-terminal IPT
sub-domain[58]. The N-terminal specificity (RHD) sub-domain resembles the core
domain of the transcription factor p53, whereas the C-terminal IPT sub-domain is an
immunoglobulin like fold containing the interaction site of NFκB with its inhibitor,
IκB[58]. The RH domain also contains DNA binding sites, ankyrin protein binding
sites and the dimerization interface[58].
As expected, Shannon’s entropy analysis[122] in each NFκB protein type showed
high residue conservation within the NH2-terminal Rel homology (RH) domain,
particularly for the DNA and ankyrin protein binding sites, as observed by troughs
representing lower entropy values in Figure 11 and Figure 12. Unlike Class II NFκB
proteins, Class I NFκB proteins contain protein-protein interaction domains including
ankyrin repeats[124-127] in the ANK domain and the Death domain[128] at the Cterminal. The ANK domain[124-127] is responsible for mediating protein-protein
interactions, while the Death domain[128] acts as an adaptor and recruiter in signaling
pathways [64]. In comparison with the N-terminal RH domain, the ANK and Death
domains in Class I NFκB proteins appeared to be less conserved according to the
51
Shannon’s entropy values[122]. Nevertheless, there seemed to be no apparent
correlation between the average disorder score and Shannon’s entropy values,
suggesting that intrinsic protein disorder is not associated with residue conservation.
The distribution of the average disorder score in each NFκB protein type (Figure 11
& Figure 12) showed predicted moderately and highly disordered residues (residues
with an average disorder score of ≥0.5 and ≥1.5) to be associated with DNA binding
sites in the RHD N-terminal specificity structural sub-domain, as well as with the
ankyrin protein binding sites in the C-terminal IPT structural sub-domain for both
Class I and Class II NFκB proteins.
The ANK and Death domains in Class I NFκB proteins (Figure 11) were observed to
be generally non-disordered. Notably, a prominent spike of average disorder score at
the N-terminal end of the ANK domain was seen in all Class I proteins, except Relish.
Interestingly, DNA binding sites in Class I proteins were observed to be more
disordered, with generally higher average disorder scores than Class II proteins. Some
DNA binding residues at the C-terminal end of the RHD domain, at approximate
alignment position 150-200, were predicted to be ordered in most Class I and Class II
proteins but were absent in RelA, RelB and C-Rel. Furthermore, residues that are part
of the dimerization interface were found to be generally more disordered in Class I
proteins (except in NFκB1) and less disordered in Class II proteins. These differences
in protein disorder patterns between Class I and Class II proteins may shed light on
their differences in mechanism and function.
52
Figure 11. Distribution of the average disorder score at each alignment position for Class I NFκB proteins at the
RHD domain of A) NFκB1, B) NFκB2 and C) Relish, as predicted by DisBatch. The average disorder score
cutoffs of 0.5 and 1.5 were used to distinguish between moderately (predicted only by PrDOS to be disordered)
and highly disordered (predicted by both PrDOS and FoldIndex) residues, respectively. Shannon’s entropy values
were also plotted in the graph for comparison.
53
Figure 12. Distribution of the average disorder score at each alignment position for Class II NFκB proteins at the
RHD domain of A) RelA, B) RelB, C) C-Rel, D) Dorsal and E) Dif, as predicted by DisBatch.
54
Figure 13. Distribution of the average disorder score at each alignment position for Class I NFκB proteins at the
IPT domain of A) NFκB1, B) NFκB2 and C) Relish, as predicted by DisBatch.
55
Figure 14. Distribution of the average disorder score at each alignment position for Class II NFκB proteins at the
IPT domain of A) RelA, B) RelB, C) C-Rel, D) Dorsal and E) Dif, as predicted by DisBatch.
56
Figure 15. Distribution of the average disorder score at each alignment position for Class I NFκB proteins at sites
with no functional annotation in A) NFκB1, B) NFκB2 and C) Relish, as predicted by DisBatch.
57
Figure 16. Distribution of the average disorder score at each alignment position for Class II NFκB proteins at sites
with no functional annotation in A) RelA, B) RelB, C) C-Rel, D) Dorsal and E) Dif, as predicted by DisBatch.
58
Figure 17. Distribution of the average disorder score at each alignment position for Class I NFκB proteins at the
ANK domain (in red) and Death domain (in black) of A) NFκB1, B) NFκB2 and C) Relish, as predicted by
DisBatch.
59
To further assess the conservation of protein disorder properties in NFκB proteins, I
measured the standard deviation of the average disorder score, as well as the scaleinvariant coefficient of variation (CV) values at each position. Scatter plots for each
NFκB protein type were generated to further examine the relationship between the
average disorder score and the standard deviation and CV of the average disorder
score (Figure 18-Figure 23). Each point in the scatter plot represents a specific
alignment position in a particular NFκB protein type. The scatter plots of both the
standard deviation and CV against the average disorder score generally agree with
each other (Figure 18-Figure 23). These plots show distinct quadrants, mainly i)
conserved, non-disordered (residues not predicted to be disordered by DisBatch
(bottom left of scatter plot) and ii) conserved, disordered (bottom right; Figure 18Figure 23).
In all NFκB protein types, the conserved, non-disordered quadrant comprised some
ankyrin protein binding sites, IPT domain residues, RHD domain residues and nonfunctionally annotated residues. Most ANK and Death domain residues in Class I
proteins were represented as outliers in the conserved, non-disordered quadrant, while
DNA binding sites in Class I NFκB proteins tended to lie in proximity to or within
the conserved disordered quadrant (Figure 18 & Figure 21). In addition, only
dimerization interface residues in NFκB2 were observed clearly in both plots to lie
within the conserved disordered quadrant (Figure 19 & Figure 22), while
dimerization interface in Class II proteins were generally found in the non-disordered
quadrants (Figure 19, Figure 20, Figure 22 & Figure 23). Class II NFκB proteins,
on the other hand, had slightly less DNA binding sites within the conserved
disordered quadrant and slightly more within the conserved, non-disordered quadrant,
compared to the Class I proteins (Figure 19, Figure 20, Figure 22 & Figure 23).
60
Interestingly, many DNA binding sites in Dif were found in the non-conserved,
disordered quadrant (Figure 20 & Figure 23).
The scatter plot of the average disorder score against the CV of the average disorder
score indicated a general inverse correlation (Figure 21, Figure 22 & Figure 23).
The correlation between the 2 variables was not perfect and quantification of the
correlation using the coefficient of determination (R2) yielded low values of ≤ 0.5.
This may possibly be an artefact since it is expected for the CV and average disorder
score to show an inverse relationship if the standard deviation does not depend on the
mean.
61
Figure 18. Scatter plot of average disorder score against the standard deviation of disorder scores for Class I NFκB
proteins, A) NFκB1, B) NFκB2 and C) Relish, as predicted by DisBatch. The scatter plots show 2 distinct
quadrants of: conserved non-disordered residues (bottom left) and conserved disordered residues (bottom right).
Functional domains and sites were annotated in the graph and coloured accordingly.
62
Figure 19. Scatter plot of average disorder score against the standard deviation of disorder scores for Class II
NFκB proteins, A) RelA, B) RelB and C) C-Rel, as predicted by DisBatch.
63
Figure 20. (Cont’d from Figure 19) Scatter plot of average disorder score against the standard deviation of average
disorder score for Class II NFκB proteins, A) Dorsal, B) Dif, as predicted by DisBatch.
64
Figure 21. Scatter plot of average disorder score against the CV of average disorder score for Class I NFκB
proteins, A) NFκB1, B) NFκB2 and C) Relish, as predicted by DisBatch. The scatter plot shows 4 distinct
quadrants of: non-conserved, non-disordered residues (top left of scatter plot), non-conserved disordered residues
(top right), conserved non-disordered residues (bottom left) and conserved disordered residues (bottom right).
Functional domains and sites were annotated in the graph and coloured accordingly.
65
Figure 22. Scatter plot of average disorder score against the CV of average disorder score for Class II NFκB
proteins, A) RelA, B) RelB and C)C-Rel, as predicted by DisBatch.
66
Figure 23. (Cont’d from Figure 22) Scatter plot of average disorder score against the CV of average disorder score
for Class II NFκB proteins, A) Dorsal, B) Dif, as predicted by DisBatch.
67
5.3.2 Structural Analysis
Following intrinsic protein disorder analysis at the sequence level, an attempt was
made to map the predicted disordered residues in NFκB proteins to representative 3D
structural data. The structural analysis was conducted to provide a more in-depth case
study of NFκB sequence properties and the structures analyzed represented a subset of
the NFκB protein sequence dataset.
Prior to this, all available 3D structures in each NFκB protein types were
superimposed against each other to confirm the representativeness of the selected
structure sample. All superimposed structures exhibited high structural similarity with
low root mean square deviation (RMSD) values (data not shown).
Conserved non-disordered and conserved disordered residues in each NFκB protein
type were mapped to available PDB[37] structures, according to their respective
quadrants demarcated in Figure 18 to Figure 23. These annotated structures were
compared with β-factor values and visualized as heat maps according to the original
PDB annotation[37].
Structural analysis of Class I NFκB homodimers showed most predicted conserved,
non-disordered residues to surround the DNA at the N-terminal RHD domain and
most conserved disordered residues to be present at the C-terminal IPT domain
containing the ankyrin protein binding sites and dimerization interface (Figure 24).
Only residues in the N-terminal Rel Homology (RH) domain were visible in 3D
structures, while coordinates of residues occurring in the C-terminal ANK and Death
domains were not present in the PDB[37] records.
68
Class II homodimers showed more conserved disordered residues surrounding the
DNA and more conserved, non-disordered residues at the C-terminal IPT domain with
ankyrin protein binding sites and the dimerization interface (Figure 25).
In addition, the insert α-helical regions of both Class I and Class II NFκB dimers were
found to be disordered (Figure 24 & Figure 25). The insert region in Class I proteins
differed from the Class II proteins in that the former contains an additional α-helix.
These disordered insert α-helical regions formed grooves in the NFκB proteins that
resembled potential protein binding sites. In Class I NFkB proteins the clefts formed
by the insert regions were narrow and deep (Figure 24) whereas in the Class II NFkB
proteins the clefts formed by the insert regions were much wider and shallower
(Figure 25). According to literature, the insert regions represent potential interaction
and/or binding sites[96].
NFκB heterodimers are formed between Class I and Class II proteins. As expected,
heterodimers contain a hybrid of conserved non-disordered and disordered residues in
all functional sites and domains, suggesting distinct mechanisms and functions
differing between Class I and Class II homodimers (Figure 26). For the heterodimers,
the configuration of the conserved disordered and non-disordered residues of their
component monomers matched with those observed in their respective homodimers.
Interestingly, inhibited NFκB dimers were largely made of conserved disordered
residues. In contrast, their inhibitors (IκB) were found to be greatly conserved and
non-disordered (Figure 27).
The protein structures annotated with intrinsic protein disorder information were
compared with their respective β-factor annotation. β-factors are also known as
temperature factors, which represent the degree of thermal displacement and thus
flexibility of an atom[37]. Research has associated intrinsically disordered protein
69
regions with high β-factors[129]. Here, the results showed a general, loose agreement
between conserved disordered residues and residues with high β-factors, as well as
conserved non-disordered residues and residues with low β-factors. This is with the
exception of Class I monomers (NFκB1 and NFκB2) that are part of the heterodimers
in Figure 26, where residues with high β-factors were found to be conserved and nondisordered.
From the observation, the DisBatch predictor appears to be more sensitive towards
intrinsically disordered regions as compared to annotated β-factors. There is a
possibly that the predictor may “over-predict” disordered regions, thus operating at
the lower range of accuracy with false positives. Alternatively, the predictor can
probably detect regions with some dynamics, but generally evolved – not to have
intrinsic disorder – but to remain disordered until a binding event occur. In that mode,
some sequence conservation of these dynamic properties is expected since
intrinsically disordered proteins exhibit structure-function relationship in the bound
form. Nevertheless, β-factors are not guaranteed reliable indicators of disorder, since
they vary according to the effects of local packing and the structural environment
[130].
70
Figure 24. Structures of representative Class I NFκB homodimers, NFκB1 (top) and NFκB2 (bottom), coloured
according to protein disorder annotations (left) and β-factors (right). The C-terminal IPT domain contains ankyrin
protein binding sites enveloping the dimerization interface. Ankyrin repeats and the Death domain were not
present in the 3D structures. The α-helical insert regions are conserved disordered residues, highlighted in red, at
the left of the protein structure in the N-terminal RHD domain.
Figure 25. Structures of representative Class II NFκB homodimers, RelA (top) and C-Rel (bottom), coloured
according to protein disorder annotations (left) and β-factors (right).
71
Figure 26. Structures of representative NFκB heterodimers formed between Class I and Class II NFκB proteins,
coloured according to protein disorder annotations (left) and β-factors (right). Examples shown here are the RelANFκB1 (top) and RelB-NFκB2 (bottom) heterodimers.
Figure 27. Structures of representative RelA homodimer (top) and RelA-NFκB1 heterodimer (bottom) in the IκB
inhibited state, coloured according to protein disorder annotations (left) and β-factors (right).
72
5.4 Discussion
This study aimed to examine the functional roles of intrinsic protein disorder in the
exemplar NFκB transcription factors. To this end, I have utilized a suite of
computational tools to identify and analyze intrinsically disordered protein regions in
different types of NFκB proteins at both sequence and structural levels. Comparisons
were made between our findings and well-known measures, including Shannon’s
entropy values[122] and β-factors[37]. β-factors are well known to correspond to
crystal quality and R-value in any given crystal structure[37].
From both sequence and structural analysis, key differences in terms of protein
disorder patterns between the Class I and Class II NFκB proteins have been observed.
Firstly, Class I NFκB proteins were more disordered in the vicinity of the DNA
contacting sites at the N-terminal RHD domain compared to Class II proteins. This
may explain their reported stronger DNA binding activity in comparison with Class II
proteins[58,106]. Protein recognition of DNA target sites represents a crucial event
for key gene functions, one of which includes transcription. It has been reported that
protein-DNA interactions proceed via an initial, non-specific binding state which
accelerates the search for target sites through multiple intramolecular processes,
including diffusion along the DNA[131,132]. This is facilitated by flexible, dual-role
residues which act as switches between non-specific and specific binding states
through conformational changes and rearrangements[131,132]. More specifically,
Kalodimos et al. observed that the hinge region in the DNA binding domain of the
lactose repressor is disordered in the free as well as non-specific binding state but
forms an α-helix in the specific binding state[132].
Hence, it could be proposed that disordered residues in NFκB proteins function
similarly in promoting protein-DNA interactions.
73
Secondly, Class I NFκB proteins were more disordered than Class II proteins at their
dimerization interfaces, suggesting different modes of dimerization. However, there
seemed to be no substantial literature on the differences in dimerization modes
between both NFκB protein classes.
Class I NFκB proteins contain ankyrin repeats in the ANK domain[124-127,133-135]
and the Death domain[128], not found in Class II proteins. Both domains have been
discovered to be non-disordered. For Class I proteins, only predicted disordered
residues occurring in the Rel Homology (RH) domain could be mapped to the 3D
crystal structure, whereas ankyrin repeats and Death domains occurring in the Nterminal IPT domain were not found in the structure. On the other hand, Class II
NFκB proteins have been reported to possess a potent trans-activation domain at the
C-terminus[58]. However, sequence-specific positions of the trans-activation domain
were not found in NFκB records in major databases during the data mining step.
Therefore, analyses of the presence, sequence localization and/or structural
localization of the N-terminal IPT and trans-activation domains were not applicable to
this study.
From this study, it could be observed that not all functional sites were intrinsically
disordered. Most ANK[125] and Death domain[128] residues in Class I proteins were
discovered to be conserved and non-disordered. Some ankyrin protein binding site
residues in both classes of proteins were conserved and non-disordered, while the rest
were either non-disordered or disordered in a conserved or non-conserved manner.
This could also be applied to RHD and IPT domain residues.
Nonetheless, many functional sites in NFκB proteins have been observed to be
conserved, both in terms of sequences (from Shannon’s entropy values)[122] and
intrinsic protein disorder properties (from SD and CV values). The conservation of
74
intrinsic protein disorder was not reflected by Shannon’s entropy analysis[122], since
there was no general observable trend between the dispersion of the average disorder
score and Shannon’s entropy values (data not shown)[122]. In fact, Chen et al. have
revealed Shannon’s entropy analysis to show relatively lower levels of sequence
conservation in disordered regions compared to non-disordered regions[51]. This
suggested that the conservation of protein disorder may capture some functional
aspects of NFκB proteins not reflected via residue conservation. I therefore propose
the use of the standard deviation (SD) and/or coefficient of variation (CV) as a more
appropriate measure of determining the conservation of protein disorder, since they
take into account the conservation of intrinsic protein disorder as a general
characteristic, possibly encompassing physiochemical and structural properties rather
than a residue-specific characteristic.
Taken together, the results suggested that evolutionarily conserved disordered
residues offer an alternative perspective on the functional roles of NFκB proteins,
especially in facilitating binding events including DNA binding, ankyrin protein
binding and possibly the binding of other proteins (from the discovery of disordered
α-helix insert regions). However, conserved non-disordered residues do also
contribute to specific NFκB functions. This highlighted that protein functions in
NFκB transcription factors are not all necessarily affected through protein disorder.
Rather, specific localizations of disordered and non-disordered residues in each
functional site contribute to various aspects protein function, and each residue type
plays unique, specific functional roles.
Taking this view, intrinsic protein disorder signatures may be critical in determining
protein function. In this study, it has been shown that differences in intrinsic protein
signatures in both classes may account for their differences in mechanism between
75
both classes of NFκB proteins. These differences may possibly even between different
types of proteins, as seen from certain exceptions in disorder signatures mentioned in
Section 5.3. The monomeric composition of the NFκB dimers has been found to
affect its DNA-binding site specificity, subcellular localization, trans-activation
potential and mode of regulation, thus leading to combinatorial diversity of the
downstream responses[63,136]. Here, intrinsic protein disorder signatures for each
NFκB protein type may account for the variability in function for each type of NFκB
dimer.
5.5 Future Work
This research demonstrated how the analysis of protein disorder can be applied to
study the function of transcription factors such as the NFκB protein family, which are
involved in many important cellular and organismal processes. Further experimental
validation and characterization of these predicted disordered residues, through both
computational and laboratory means, are necessary to support the functional roles of
these conserved disordered (and/or non-disordered) residues.
Experimentally, imaging approaches can be used to observe in more detail the
dynamics involved in NFκB interactions. Mutagenesis studies can be performed on
conserved disordered residues to observe the change in function. Also, systems
biology approaches integrating both experimental and computational methods can be
utilized in the future for modeling and understanding NFκB interactions and
pathways.
Computationally, datasets on additional protein families can be used as positive and
negative controls for a more robust analysis. These control datasets can also be used
76
to rigorously test the use of SD and/or CV as a quantifying measure of the
conservation of intrinsic protein disorder.
In addition, the value of using unique disorder motifs for protein families (in the form
of matrices) for functional prediction, using approaches such as machine learning
methods, can be investigated. If it can be demonstrated that protein families indeed
have distinct disorder patterns, a database of these signatures can be developed for the
benefit of the scientific community.
Lastly, more in-depth structural analysis can be conducted to lend support to the
findings in this study. More work can be done to quantify the correlation between
intrinsic protein disorder and β-factors. Smith et al. advised on normalizing β-factors
for more accurate comparisons[130]. The same procedure can be applied to this study
in the future. Further structural analysis can also include molecular dynamics
simulations and probabilistic conformational space sampling of known NFκB
interacting structures, to investigate the effects and functional implications of
interactions involving intrinsically disordered residues on permitted NFκB protein
conformations.
5.6 Chapter Conclusion
Protein disorder has been implicated in various regulatory functions in the cell,
particularly in transcription and cell signaling, allowing for the accommodation of
multiple interaction partners and modification sites and the provision of flexibility in
regulation[15,16]. In this study, I have described a study aimed at investigating the
functional role of intrinsically disordered regions in NFκB proteins, a set of
eukaryotic transcription factors involved in diverse cellular and organismal processes.
I have examined the conservation of predicted disordered regions across known
77
representative NFkB protein sequences and analyzed the sequence and structural
configuration of these disordered residues. From this study, distinctive protein
disorder patterns across each NFκB protein type conserved across different species
were observed, potentially highlighting key differences in mechanism and function
between NFκB protein classes and types. Intrinsic protein disorder study therefore
provides a different perspective in gaining more in-depth understanding of the
mechanisms underlying NFκB transcriptional regulation.
78
6 Conclusion
For my thesis, I have developed an analysis pipeline, comprising computational
prediction, data mining and sequence and structure analysis, to investigate the
functional role of intrinsic protein disorder in the exemplar NFκB protein family.
The pipeline represents the first critical step in analyzing and understanding intrinsic
protein disorder and its role in protein function. Quantitative and qualitative findings
of this project supported the emerging paradigm that the dynamics of signaling
proteins in general, play important roles in modulating their functions. Protein
disorder thus offers a new way of analyzing and understanding protein-protein
binding and interactions, contributing to the further understanding of functional
conservation in relatively diverse proteins.
One exciting and meaningful implication of this project is that protein families may
possess distinctive disorder signatures or motifs, which can prove valuable for
functional characterization. To this end, the protein disorder analysis pipeline
outlined, once established, can be applied to study the conservation and configuration
of dynamic residues in other transcription factors and protein families. Key questions
that can be addressed in future can include those pertaining to the complexity of
transcription regulation machinery and signaling pathways encompassing signal
integration and cross-talk.
The methodology and findings of this study will lay the foundation to similar research
in the future, thereby contributing significantly to research in the field of
transcriptional regulation and cellular signaling in the future.
79
7 References
1. Barnes C, Monteith W, Pielak G. Internal and Global Protein Motion Assessed with a
Fusion Construct and In-Cell NMR Spectroscopy. Chembiochem. 2010;
2. Miller RJD. Energetics and Dynamics of Deterministic Protein Motion. Acc. Chem. Res.
1994;27(5):145-150.
3. Hammes-Schiffer S, Benkovic S. Relating protein motion to catalysis. Annu Rev Biochem.
2006;75:519-541.
4. Genberg L, Richard L, McLendon G, Miller R. Direct observation of global protein motion in
hemoglobin and myoglobin on picosecond time scales. Science. 1991;251(4997):1051-1054.
5. Jasnin M, Moulin M, Haertlein M, Zaccai G, Tehei M. In vivo measurement of internal and
global macromolecular motions in Escherichia coli. Biophys J. 2008;95(2):857-864.
6. Terada TP, Sasai M, Yomo T. Conformational change of the actomyosin complex drives the
multiple stepping movement. Proc Natl Acad Sci U S A 2002;99(14):9202-9206.
7. Hirose S, Yokota K, Kuroda Y, Wako H, Endo S, Kanai S, et al. Prediction of protein motions
from amino acid sequence and its application to protein-protein interaction. BMC Struct Biol
2010;10:20.
8. Echols N, Milburn D, Gerstein M. MolMovDB: analysis and visualization of conformational
change and structural flexibility. Nucleic Acids Res 2003;31(1):478-482.
9. Wootton JC. Non-globular domains in protein sequences: automated segmentation using
complexity measures. Comput Chem 1994;18(3):269-285.
10. Smock R, Gierasch L. Sending signals dynamically. Science. 2009;324(5924):198-203.
11. Bu Z, Callaway D. Proteins MOVE! Protein dynamics and long-range allostery in cell
signaling. Adv Protein Chem Struct Biol. 2011;83:163-221.
12. Ashish, Juncadella IJ, Garg R, Boone CD, Anguita J, Krueger JK, et al. Conformational
rearrangement within the soluble domains of the CD4 receptor is ligand-specific. J Biol Chem
2008;283(5):2761-2772.
13. Good M, Zalatan J, Lim W. Scaffold proteins: hubs for controlling the flow of cellular
information. Science. 2011;332(6030):680-686.
80
14. Structure of tumor suppressor p53 and its intrinsically disordered N-terminal
transactivation domain. Proceedings of the National Academy of Sciences; 2008. 5762 p.
15. Cortese M, Uversky V, Dunker A. Intrinsic disorder in scaffold proteins: getting more
from less. Prog Biophys Mol Biol. 2008;98(1):85-106.
16. Dunker A, Silman I, Uversky V, Sussman J. Function and structure of inherently
disordered proteins. Curr Opin Struct Biol. 2008;18(6):756-764.
17. Uversky V, Dunker A. Understanding protein non-folding. Biochim Biophys Acta.
2010;1804(6):1231-1264.
18. Linding R, Jensen L, Diella F, Bork P, Gibson T, Russell R, et al. Protein disorder prediction:
implications for structural proteomics. Structure. 2003;11(11):1453-1459.
19. Dunker A, Oldfield C, Meng J, Romero P, Yang J, Chen J, et al. The unfoldomics decade:
an update on intrinsically disordered proteins. BMC Genomics. 2008;
20. Uversky V. Natively unfolded proteins: a point where biology waits for physics. Protein
Sci. 2002;11(4):739-756.
21. Radhakrishnan I, Pérez-Alvarado G, Parker D, Dyson H, Montminy M, Wright P, et al.
Solution structure of the KIX domain of CBP bound to the transactivation domain of CREB: a
model for activator:coactivator interactions. Cell. 1997;91(6):741-752.
22. Sigalov AB, Uversky VN. Differential occurrence of protein intrinsic disorder in the
cytoplasmic signaling domains of cell receptors. Self/nonself 2011;2(1):55-72.
23. Sandhu KS. Intrinsic disorder explains diverse nuclear roles of chromatin remodeling
proteins. J Mol Recognit 2009;22(1):1-8.
24. Mészáros B, Simon I, Dosztányi Z. The expanding view of protein-protein interactions:
complexes involving intrinsically disordered proteins. Phys Biol 2011;8(3):035003.
25. Mészáros B, Simon I, Dosztányi Z. Prediction of protein binding regions in disordered
proteins. PLoS Comput Biol 2009;5(5):e1000376.
26. Brown CJ, Johnson AK, Dunker AK, Daughdrill GW. Evolution and disorder. Curr Opin
Struct Biol 2011;
27. Dosztányi Z, Tompa P. Prediction of protein disorder. Methods Mol Biol. 2008;:103-115.
28. 17 Protein disorder prediction servers [Internet]. [updated 2009; cited 2011]. Available
from: http://rosettadesigngroup.com/blog/521/17-protein-disorder-prediction-servers/
81
29. Romero P, Obradovic Z, Li X, Garner EC, Brown CJ, Dunker AK, et al. Sequence complexity
of disordered protein. Proteins 2001;42(1):38-48.
30. Dunker A, Brown C, Lawson J, Iakoucheva L, Obradović Z. Intrinsic disorder and protein
function. Biochemistry. 2002;41(21):6573-6582.
31. Li, Romero, Rani, Dunker, Obradovic. Predicting Protein Disorder for N-, C-, and Internal
Regions. Genome Inform Ser Workshop Genome Inform 1999;10:30-40.
32. Prilusky J, Felder CE, Zeev-Ben-Mordehai T, Rydberg EH, Man O, Beckmann JS, et al.
FoldIndex: a simple tool to predict whether a given protein sequence is intrinsically
unfolded. Bioinformatics 2005;21(16):3435-3438.
33. Dosztányi Z, Csizmok V, Tompa P, Simon I. IUPred: web server for the prediction of
intrinsically unstructured regions of proteins based on estimated energy content.
Bioinformatics 2005;21(16):3433-3434.
34. Linding R, Russell RB, Neduva V, Gibson TJ. GlobPlot: exploring protein sequences for
globularity and disorder. Nucleic Acids Res 2003;:3701-3708.
35. Kabsch W, Sander C. Dictionary of protein secondary structure: pattern recognition of
hydrogen-bonded and geometrical features. Biopolymers 1983;22(12):2577-2637.
36. Ward J, Sodhi J, McGuffin L, Buxton B, Jones D. Prediction and functional analysis of
native disorder in proteins from the three kingdoms of life. J Mol Biol. 2004;337(3):635-645.
37. Berman H, Henrick K, Nakamura H. Announcing the worldwide Protein Data Bank. Nat
Struct Biol 2003;10(12):980.
38. Ishida T, Kinoshita K. PrDOS: prediction of disordered protein regions from amino acid
sequence. Nucleic Acids Res 2007;35(Web Server issue):W460-W464.
39. McGuffin LJ. Intrinsic disorder prediction from the analysis of multiple protein fold
recognition models. Bioinformatics 2008;24(16):1798-1804.
40. Schlessinger A, Punta M, Yachdav G, Kajan L, Rost B. Improved Disorder Prediction by
Combination of Orthogonal Approaches. PLoS ONE. 2009;4(2):e4433.
41. Genesilico metadisorder service [Internet]. [updated 2011; cited 2011]. Available from:
http://iimcb.genesilico.pl/metadisorder/overview.html
42. Ishida T, Kinoshita K. Prediction of disordered regions in proteins based on the meta
approach. Bioinformatics 2008;24(11):1344-1348.
43. Xue B, Dunbrack RL, Williams RW, Dunker AK, Uversky VN. PONDR-FIT: a meta-predictor
82
of intrinsically disordered amino acids. Biochim Biophys Acta 2010;1804(4):996-1010.
44. Lieutaud P, Canard B, Longhi S. MeDor a metaserver for predicting protein disorder. BMC
Genomics. 2008;9(Suppl 2):S25.
45. Moult J, Fidelis K, Zemla A, Hubbard T. Critical assessment of methods of protein
structure prediction (CASP)-round V. Proteins 2003;53 Suppl 6:334-339.
46. Sickmeier M, Hamilton JA, LeGall T, Vacic V, Cortese MS, Tantos A, et al. DisProt: the
Database of Disordered Proteins. Nucleic Acids Res 2007;35(Database issue):D786-D793.
47. Ahmed A, Villinger S, Gohlke H. Large-scale comparison of protein essential dynamics
from molecular dynamics simulations and coarse-grained normal mode analyses. Proteins.
2010;78(16):3341-3352.
48. Maguid S, Fernández-Alberti S, Parisi G, Echave J. Evolutionary conservation of protein
backbone flexibility. J Mol Evol. 2006;63(4):448-457.
49. Maguid S, Fernandez-Alberti S, Echave J. Evolutionary conservation of protein vibrational
dynamics. Gene. 2008;422(1-2):7-13.
50. Law AB, Fuentes EJ, Lee AL. Conservation of Side-Chain Dynamics Within a Protein
Family. J. Am. Chem. Soc. 2009;131(18):6322-6323.
51. Chen J, Romero P, Uversky V, Dunker A. Conservation of intrinsic disorder in protein
domains and families: I. A database of conserved predicted disordered regions. J Proteome
Res. 2006;5(4):879-887.
52. Dunker A, Garner E, Guilliot S, Romero P, Albrecht K, Hart J, et al. Protein disorder and
the evolution of molecular recognition: theory, predictions and observations. Pac Symp
Biocomput. 1998;1998:473-484.
53. Schlessinger A, Schaefer C, Vicedo E, Schmidberger M, Punta M, Rost B, et al. Protein
disorder-a breakthrough invention of evolution? Curr Opin Struct Biol 2011;
54. Liu J, Tan H, Rost B. Loopy proteins appear conserved in evolution. J Mol Biol.
2002;322(1):53-64.
55. Sonnhammer E, Eddy S, Durbin R. Pfam: a comprehensive database of protein domain
families based on seed alignments. Proteins 1997;28(3):405-420.
56. Tatusov R, Koonin E, Lipman D. A genomic perspective on protein families. Science
1997;278(5338):631-637.
57. Ahn K, Aggarwal B. Transcription factor NF-kappaB: a sensor for smoke and stress signals.
83
Ann N Y Acad Sci. 2005;1056:218-233.
58. Rangan G, Wang Y, Harris D. NF-kappaB signalling in chronic kidney disease. Front Biosci
2009;14:3496-3522.
59. Courtois G, Gilmore TD. Mutations in the NF-kappaB signaling pathway: implications for
human disease. Oncogene 2006;25(51):6831-6843.
60. Gilmore TD. Introduction to NF-kappaB: players, pathways, perspectives. Oncogene
2006;25(51):6680-6684.
61. Gauthier M, Degnan B. The transcription factor NF-kappaB in the demosponge
Amphimedon queenslandica: insights on the evolutionary origin of the Rel homology
domain. Dev Genes Evol 2008;218(1):23-32.
62. Gómez J, García-Domingo D, Martínez-A C, Rebollo A. Role of NF-kappaB in the control of
apoptotic and proliferative responses in IL-2-responsive T cells. Front Biosci 1997;2:d49-d60.
63. Vallabhapurapu S, Karin M. Regulation and function of NF-kappaB transcription factors in
the immune system. Annu Rev Immunol 2009;:693-733.
64. Hayden MS, Ghosh S. Shared principles in NF-kappaB signaling. Cell 2008;132(3):344-362.
65. Mercurio F, Manning AM. Multiple signals converging on NF-kappaB. Curr Opin Cell Biol
1999;11(2):226-232.
66. Siebenlist U, Franzoso G, Brown K. Structure, regulation and function of NF-kappa B.
Annu Rev Cell Biol 1994;10:405-455.
67. Baldwin AS. The NF-kappa B and I kappa B proteins: new discoveries and insights. Annu
Rev Immunol 1996;14:649-683.
68. Hayden MS, Ghosh S. Signaling to NF-kappaB. Genes Dev 2004;18(18):2195-2224.
69. Karin M. NF-kappaB and cancer: mechanisms and targets. Mol Carcinog 2006;45(6):355361.
70. Karin M. Nuclear factor-kappaB in cancer development and progression. Nature
2006;441(7092):431-436.
71. Kim HJ, Hawke N, Baldwin AS. NF-kappaB and IKK as therapeutic targets in cancer. Cell
Death Differ 2006;13(5):738-747.
72. Van Waes C. Nuclear factor-kappaB in development, prevention, and therapy of cancer.
Clin Cancer Res 2007;13(4):1076-1082.
84
73. Wu L, Choe K, Lu Y, Anderson K. Drosophila immunity: genes on the third chromosome
required for the response to bacterial infection. Genetics 2001;159(1):189-199.
74. Gugasyan R, Grumont R, Grossmann M, Nakamura Y, Pohl T, Nesic D, et al. Rel/NFkappaB transcription factors: key mediators of B-cell activation. Immunol Rev 2000;176:134140.
75. Bunting K, Rao S, Hardy K, Woltring D, Denyer G, Wang J, et al. Genome-wide analysis of
gene expression in T cells to identify targets of the NF-kappa B transcription factor c-Rel. J
Immunol 2007;178(11):7097-7109.
76. Panzer U, Steinmetz O, Turner J, Meyer-Schwesinger C, von R, Meyer T, et al. Resolution
of renal inflammation: a new role for NF-kappaB1 (p50) in inflammatory kidney diseases. Am
J Physiol Renal Physiol 2009;297(2):F429-F439.
77. Liou H, Hsia C. Distinctions between c-Rel and other NF-kappaB proteins in immunity and
disease. Bioessays 2003;25(8):767-780.
78. Jones W, Brown M, Wilhide M, He S, Ren X. NF-κB in cardiovascular disease: Diverse and
specific effects of a "general" transcription factor? Cardiovasc Toxicol 2005;5(2):183-201.
79. Papin J, Subramaniam S. Bioinformatics and cellular signaling. Curr Opin Biotechnol
2004;15(1):78-81.
80. Huang S, Wikswo J. Dimensions of systems biology. Rev Physiol Biochem Pharmacol.
2006;:81-104.
81. Strange K. The end of "naive reductionism": rise of systems biology or renaissance of
physiology? Am J Physiol Cell Physiol. 2005;288(5):C968-C974.
82. Rizzetto L, Cavalieri D. A systems biology approach to the mutual interaction between
yeast and the immune system. Immunobiology. 2010;
83. Cadeiras M, von B, Sinha A, Shahzad K, Lim W, Grenett H, et al. Drawing networks of
rejection - a systems biological approach to the identification of candidate genes in heart
transplantation. J Cell Mol Med. 2010;
84. Wei L, Fan M, Xu L, Heinrich K, Berry M, Homayouni R, et al. Bioinformatic analysis
reveals crel as a regulator of a subset of interferon-stimulated genes. J Interferon Cytokine
Res. 2008;28(9):541-551.
85. Gyorffy A, Baranyai Z, Cseh A, Munkácsy G, Jakab F, Tulassay Z, et al. Promoter analysis
suggests the implication of NFkappaB/C-Rel transcription factors in biliary atresia.
Hepatogastroenterology 2008;55(85):1189-1192.
85
86. Elkon R, Rashi-Elkeles S, Lerenthal Y, Linhart C, Tenne T, Amariglio N, et al. Dissection of a
DNA-damage-induced transcriptional network using a combination of microarrays, RNA
interference and computational promoter analysis. Genome Biol 2005;6(5):R43.
87. Papatsenko D, Levine M. Quantitative analysis of binding motifs mediating diverse spatial
readouts of the Dorsal gradient in the Drosophila embryo. Proc Natl Acad Sci U S A
2005;102(14):4966-4971.
88. Udalova I, Mott R, Field D, Kwiatkowski D. Quantitative prediction of NF-kappa B DNAprotein interactions. Proc Natl Acad Sci U S A 2002;99(12):8167-8172.
89. Linnell J, Mott R, Field S, Kwiatkowski D, Ragoussis J, Udalova I, et al. Quantitative highthroughput analysis of transcription factor binding specificities. Nucleic Acids Res 2004;32(4)
90. Matys V, Fricke E, Geffers R, Gössling E, Haubrock M, Hehl R, et al. TRANSFAC:
transcriptional regulation, from patterns to profiles. Nucleic Acids Res 2003;31(1):374-378.
91. Ip Y, Levine M. Molecular genetics of Drosophila immunity. Curr Opin Genet Dev
1994;4(5):672-677.
92. Waterhouse RM, Kriventseva EV, Meister S, Xi Z, Alvarez KS, Bartholomay LC, et al.
Evolutionary dynamics of immune-related genes and pathways in disease-vector
mosquitoes. Science 2007;316(5832):1738-1743.
93. Wu X, Xiong X, Xie L, Zhang R. Pf-Rel, a Rel/nuclear factor-kappaB homolog identified
from the pearl oyster, Pinctada fucata. Acta Biochim Biophys Sin (Shanghai) 2007;39(7):533539.
94. Huang X, Yin Z, Liao J, Wang P, Yang L, Ai H, et al. Identification and functional study of a
shrimp Relish homologue. Fish Shellfish Immunol 2009;27(2):230-238.
95. Huang X, Yin Z, Jia X, Liang J, Ai H, Yang L, et al. Identification and functional study of a
shrimp Dorsal homologue. Dev Comp Immunol 2009;
96. Sullivan J, Kalaitzidis D, Gilmore T, Finnerty J. Rel homology domain-containing
transcription factors in the cnidarian Nematostella vectensis. Dev Genes Evol
2007;217(1):63-72.
97. Müller CW, Harrison SC. The structure of the NF-kappa B p50:DNA-complex: a starting
point for analyzing the Rel family. FEBS Lett 1995;369(1):113-117.
98. Muller C, Rey F, Sodeoka M, Verdine G, Harrison S. Structure of the NF-κB p50
homodimer bound to DNA. Nature 1995;373(6512):311-317.
86
99. Ghosh G, Van Duyne G, Ghosh S, Sigler P. Structure of NF-κB p50 homodimer bound to a
kB site. Nature 1995;373(6512):303-310.
100. Pande V, Sharma R, Inoue J, Otsuka M, Ramos M. A molecular modeling study of
inhibitors of nuclear factor kappa-B (p50)--DNA binding. J Comput Aided Mol Des
2003;17(12):825-836.
101. Mura C, McCammon J. Molecular dynamics of a kappaB DNA element: base flipping via
cross-strand intercalative stacking in a microsecond-scale simulation. Nucleic Acids Res
2008;36(15):4941-4955.
102. Copley R, Totrov M, Linnell J, Field S, Ragoussis J, Udalova I, et al. Functional
conservation of Rel binding sites in drosophilid genomes. Genome Res 2007;17(9):13271335.
103. Xue B, Dunker AK, Uversky VN. Retro-MoRFs: Identifying Protein Binding Sites by
Normal and Reverse Alignment and Intrinsic Disorder Prediction. International journal of
molecular sciences 2010;11(10):3725-3747.
104. Liu T, Altman RB. Prediction of calcium-binding sites by combining loop-modeling with
machine learning. BMC Struct Biol 2009;9:72.
105. Baldi P, Brunak S, Chauvin Y, Andersen CA, Nielsen H. Assessing the accuracy of
prediction algorithms for classification: an overview. Bioinformatics 2000;16(5):412-424.
106. Uversky V, Gillespie J, Fink A. Why are "natively unfolded" proteins unstructured under
physiologic conditions? Proteins. 2000;41(3):415-427.
107. Moult J, Fidelis K, Kryshtafovych A, Rost B, Tramontano A. Critical assessment of
methods of protein structure prediction - Round VIII. Proteins 2009;77 Suppl 9:1-4.
108. Huxford T, Malek S, Ghosh G. Structure and mechanism in NF-kappa B/I kappa B
signaling. Cold Spring Harb Symp Quant Biol 1999;:533-540.
109. Bottex-Gauthier C, Pollet S, Favier A, Vidal D. [The Rel/NF-kappa-B transcription factors:
complex role in cell regulation]. Pathol Biol (Paris) 2002;50(3):204-211.
110. Baeuerle PA, Baltimore D. NF-kappa B: ten years after. Cell 1996;87(1):13-20.
111. Thanos D, Maniatis T. NF-kappa B: a lesson in family values. Cell 1995;80(4):529-532.
112. Benson D, Karsch-Mizrachi I, Lipman D, Ostell J, Sayers E. GenBank. Nucleic Acids Res
2009;37(Database issue):D26-D31.
113. Schneider M, Lane L, Boutet E, Lieberherr D, Tognolli M, Bougueleret L, et al. The
87
UniProtKB/Swiss-Prot knowledgebase and its Plant Proteome Annotation Program. J
Proteomics 2009;72(3):567-573.
114. Altschul S, Gish W, Miller W, Myers E, Lipman D. Basic local alignment search tool. J Mol
Biol 1990;215(3):403-410.
115. Ashburner M, Ball C, Blake J, Botstein D, Butler H, Cherry J, et al. Gene ontology: tool for
the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000;25(1):25-29.
116. Povey S, Lovering R, Bruford E, Wright M, Lush M, Wain H, et al. The HUGO Gene
Nomenclature Committee (HGNC). Hum Genet. 2001;109(6):678-680.
117. Mulder NJ, Apweiler R, Attwood TK, Bairoch A, Bateman A, Binns D, et al. New
developments in the InterPro database. Nucleic Acids Res 2007;35(Database issue):D224D228.
118. Sayers E, Barrett T, Benson D, Bolton E, Bryant S, Canese K, et al. Database resources of
the National Center for Biotechnology Information. <b>Nucleic</b> Acids Res.
2010;38(Database issue):D5-16.
119. Tong J, Lim S, Muh H, Chew F, Tammi M. Allergen Atlas: a comprehensive knowledge
center and analysis resource for allergen information. Bioinformatics 2009;
120. Tay D, Govindarajan K, Khan A, Ong T, Samad H, Soh W, et al. T3SEdb: data warehousing
of virulence effectors secreted by the bacterial Type III Secretion System. BMC
Bioinformatics. 2010;7
121. Edgar RC. MUSCLE: a multiple sequence alignment method with reduced time and
space complexity. BMC Bioinformatics 2004;
122. EUAsiaGrid – Widening the Uptake of e-Research in the Asia-Pacific Region. eScience,
2008. eScience '08. IEEE Fourth International Conference on; Indianapolis, IN: 2008. 360 p.
123. Hall T. BioEdit: a user-friendly biological sequence alignment editor and analysis
program for Windows 95/98/NT. Nucleic Acids Symp Ser 1999;41:95-98.
124. Pierce JR. An introduction to information theory. Dover Pubns; 1980. 305 p.
125. The PyMOL Molecular Graphics System, Version 0.99rc6 [Internet]. Available from:
http://www.pymol.org
126. Al-Khodor S, Price C, Kalia A, Abu K. Functional diversity of ankyrin repeats in microbial
proteins. Trends Microbiol. 2009;
127. Voronin D, Kiseleva E. [Functional role of proteins containing ankyrin repeats].
88
Tsitologiia. 2007;49(12):989-999.
128. Li J, Mahajan A, Tsai M. Ankyrin repeat: a unique motif mediating protein-protein
interactions. Biochemistry. 2006;45(51):15168-15178.
129. Mosavi L, Cammett T, Desrosiers D, Peng Z. The ankyrin repeat as molecular
architecture for protein recognition. Protein Sci. 2004;13(6):1435-1448.
130. Park HH, Lo YC, Lin SC, Wang L, Yang JK, Wu H, et al. The death domain superfamily in
intracellular signaling of apoptosis and inflammation. Annu Rev Immunol 2007;25:561-586.
131. Goh GKM, Dunker AK, Uversky VN. A comparative analysis of viral matrix proteins using
disorder predictors. Virol J. 2008;5:126.
132. Smith DK, Radivojac P, Obradovic Z, Dunker AK, Zhu G. Improved amino acid flexibility
parameters. Protein Sci. 2003;12(5):1060-1072.
133. Chirgadze DY, Demydchuk M, Becker M, Moran S, Paoli M. Snapshot of protein
structure evolution reveals conservation of functional dimerization through intertwined
folding. Structure 2004;12(8):1489-1494.
134. Kalodimos CG, Biris N, Bonvin AMJJ, Levandoski MM, Guennuegues M, Boelens R, et al.
Structure and Flexibility Adaptation in Nonspecific and Specific Protein-DNA Complexes.
Science 2004;305(5682):386-389.
135. Huang J, Zhao X, Yu H, Ouyang Y, Wang L, Zhang Q, et al. The ankyrin repeat gene family
in rice: genome-wide identification, classification and expression profiling. Plant Mol Biol.
2009;71(3):207-226.
136. Wu T, Tian Z, Liu J, Yao C, Xie C. A novel ankyrin repeat-rich gene in potato, Star,
involved in response to late blight. Biochem Genet. 2009;47(5-6):439-450.
137. Zhu Y, Kakinuma N, Wang Y, Kiyama R. Kank proteins: a new family of ankyrin-repeat
domain-containing proteins. Biochim Biophys Acta. 2008;1780(2):128-133.
138. Hoffmann A, Baltimore D. Circuitry of nuclear factor κB signaling. Immunol Rev
2006;210:171-186.
89
[...]... reviewed in general[10,11,15,48-50] 2.4.1 Intrinsic Protein Disorder Analysis of NFκB No intrinsic protein disorder analysis focusing solely on NFκB has been recorded in literature Nevertheless, general research efforts using intrinsic protein disorder to identify protein binding sites[101,102] and analyse the functions of chromatin remodeling proteins have been recorded[22] In the context of cell signaling,... or in the presence of changes in the biochemical environment [19,20] Intrinsically disordered proteins and protein regions have been reported to engage multiple binding partners and are involved in many biological events and pathways, especially during cell signaling[14,15,22-24] 4 1.3.1 Role of Intrinsic Protein Disorder in Cell Signaling In the context of cell signaling, intrinsically disordered proteins... cell signaling, the functional roles of intrinsic protein disorder in cytoplasmic signaling domains[22] and in scaffold proteins, which integrate cell signaling pathways[15], have been reported The most relevant study of intrinsic protein disorder in transcription factors was conducted by Wells et al., who analyzed p53’s intrinsically disordered N-terminal trans-activation domain (TAD) using NMR spectroscopy...Table of Contents 1 Introduction 1 1.1 Protein Dynamics 1 1.2 Functional Significance of Protein Dynamics 2 1.2.1 1.3 Role of Protein Dynamics in Cell Signaling 3 Intrinsic Protein Disorder 4 1.3.1 Role of Intrinsic Protein Disorder in Cell Signaling 5 1.3.2 Identification of intrinsic protein disorder 5 1.3.2.1... database containing sequences across multiple species annotated with experimentally verified intrinsically disordered regions[46] 1.3.3 Functional Conservation of Intrinsic Protein Disorder The functional importance of intrinsically disordered proteins and protein regions raises the likelihood that intrinsically disordered protein residues are evolutionarily conserved This proposal is in line with studies... provide further impetus for intrinsic protein disorder prediction, since 2002, the worldwide Critical Assessment of Techniques for Protein Structure Prediction (CASP) experiments introduced a new category for protein disorder prediction, using blind benchmark datasets[45] Intrinsic protein disorder prediction has also been facilitated by the availability of the Database of Protein Disorder (DisProt) since... methods for detecting intrinsic protein disorder are often hampered by the lack of stable protein structures[27] To overcome this limitation, various computational tools have been developed for the prediction of intrinsically disordered proteins and protein regions from primary protein sequences[27] 5 1.3.2.1 Computational Tools for Intrinsic Protein Disorder Prediction Various definitions have been... and protein- protein interactions[2] These events are in turn vital to a large array of essential biological processes and functions[1,3,6,10-12] 2 An example is the crucial role of protein dynamics in muscle contraction[6] Muscle contraction involves the cross-bridge cycle, with the first step involving adenosine triphosphate (ATP) binding to the myosin head Binding of the myosin head to actin myofilaments,... date, only one protein dynamics study mentioning NFκB proteins is present in the literature The authors simulated the interaction between C-Rel and a 20-bp DNA sequence and observed a unique and dynamic NFκB recognition site The study was focused on the dynamics of the DNA, rather than the dynamics of the C-Rel protein during binding[99] However, the effects of protein dynamics in cell signaling and allosteric... provide large intermolecular interfaces with smaller protein, genome and cell sizes[25] For example, the recognition of DNA by disordered peptides has been shown to be involved in the regulation of gene expression by transcription, epigenetic modifications and gene silencing[26] 1.3.2 Identification of intrinsic protein disorder Intrinsically disordered proteins and protein regions can be indirectly observed ... verified intrinsically disordered regions[46] 1.3.3 Functional Conservation of Intrinsic Protein Disorder The functional importance of intrinsically disordered proteins and protein regions raises the. .. Role of Intrinsic Protein Disorder in Cell Signaling In the context of cell signaling, intrinsically disordered proteins and regions have been associated with many regulatory events Intrinsic protein. .. Cell Signaling Intrinsic Protein Disorder 1.3.1 Role of Intrinsic Protein Disorder in Cell Signaling 1.3.2 Identification of intrinsic protein disorder 1.3.2.1