Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 13 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
13
Dung lượng
560,19 KB
Nội dung
NumericalcalculationsofthepHofmaximalprotein stability
The effect ofthe sequence composition and three-dimensional structure
Emil Alexov
Howard Hughes Medical Institute and Columbia University, Biochemistry Department, New York, USA
A large number of proteins, found experimentally to have
different optimum pHofmaximal stability, were studied to
reveal the basic principles of their preferenence for a par-
ticular pH. The pH-dependent free energy of folding was
modeled numerically as a function ofpH as well as the net
charge ofthe protein. The optimum pH was determined in
the numericalcalculations as thepHofthe minimum free
energy of folding. The experimental data for thepH of
maximal stability (experimental optimum pH) was repro-
ducible (rmsd ¼ 0.73). It was shown that the optimum pH
results from two factors – amino acid composition and the
organization ofthe titratable groups with the 3D structure.
It was demonstrated that the optimum pH and isoelectric
point could be quite different. In many cases, the optimum
pH was found at a pH corresponding to a large net charge of
the protein. At the same time, there was a tendency for
proteins having acidic optimum pHs to have a base/acid
ratio smaller than one and vice versa. The correlation
between the optimum pH and base/acid ratio is significant if
only buried groups are taken into account. It was shown that
a protein that provides a favorable electrostatic environment
for acids and disfavors the bases tends to have high optimum
pH and vice versa.
Keywords: electrostatics; pH stability; pK
a
; optimum pH.
The concentration of hydrogen ions (pH) is an important
factor that affects protein function and stability in different
locations in the cell and in the body [1]. Physiological pH
varies in different organs in human body: thepH in the
digestive tract ranges from 1.5 to 7.0, in the kidney it ranges
from 4.5 to 8.0, and body liquids have a pHof 7.2–7.4 [2]. It
was shown that the interstitial fluid of solid tumors have
pH ¼ 6.5–6.8, which differs from the physiological pH of
normal tissue and thus can be used for the design of pH
selective drugs [3].
The structure and function of most macromolecules are
influenced by pH, and most proteins operate optimally at a
particular pH (optimum pH) [4]. On the basis of indirect
measurements, it has been found that the intracellular pH
usually ranges between 4.5 and 7.4 in different cells [5]. The
organelles’ pH affects protein function and variation of pH
away from normal could be responsible for drug resistance
[6]. Lysosomal enzymes function best at the low pHof 5
found in lysosomes, whereas cytosolic enzymes function
best at the close to neutral pHof 7.2 [1].
Experimental studies of pH-dependent properties [7–11]
such as stability, solubility and activity, provide the benchmarks
for numerical simulation. Experiments revealed that altho-
ugh the net charge of ribonuclease Sa does affect the
solubility, it does not affect thepHofmaximalstability or
activity [12]. Another experimental technique as acidic or
basic denaturation [13–15] demonstrates the importance of
electrostatic interactions on protein stability.
pH-dependent phenomena have been extensively mode-
led using numerical approaches [16–19]. A typical task is to
compute the pK
a
s of ionizable groups [20–26], the isoelectric
point [27,28] or the electrostatic potential distribution
around the active site [29]. It was shown that activity of
nine lipases correlates with thepH dependence of the
electrostatic potential mapped on the molecular surface of
the molecules [29]. pH dependence of unfolding energy was
modeled extensively and the models reproduced reasonable
the experimental denaturation free energy as a function of
pH [19,30–36].
The success ofthenumerical protocol to compute the
pH dependence ofthe free energy depends on the model
of the unfolded state, the model of folded state and thus
on the calculated pK
a
s. It is well recognized that the
unfolded state is compact and native-like, but the magni-
tude ofthe residual pairwise interactions and the desol-
vation energies has been debated. Some ofthe studies
found that any residual structure ofthe unfolded state has
negligible effect on the calculated pH dependence of
unfolding free energy [31], while others found the opposite
[33–36]. It was estimated that the pK
a
s ofthe acidic
groups in unfolded state are shifted by – 0.3 pK units in
respect to the pK
a
s of model compounds. Although
including the measured and simulated pK shifts into the
model of unfolded state changes thepH dependence of
the unfolding free energy, it most ofthe cases it does not
change thepHofmaximalstability [33–36]. Much more
Correspondence to E. Alexov, Howard Hughes Medical Institute and
Columbia University, Biochemistry Department, 630 W 168 Street,
New York, NY 10032, USA.
Fax: + 1 212 305 6926, Tel.: + 1 212 305 0265,
E-mail: ea388@columbia.edu
Abbreviations: MCCE, multi-conformation continuum electrostatic;
SAS, solvent accessible surface.
(Received 15 September 2003, accepted 11 November 2003)
Eur. J. Biochem. 271, 173–185 (2004) Ó FEBS 2003 doi:10.1046/j.1432-1033.2003.03917.x
important is the modeling ofthe folded state, where the
errors of computing pK
a
s could be significantly larger
than 0.3 units. Over the years it has been a continuous
effort to develop methods for accurate pK
a
predictions
[20,21]. These include empirical methods [37], macroscopic
methods [38–41], finite difference Poisson–Boltzmann
(FDPB)-based methods [20–22,42], FDPB and molecular
dynamics [43–45], FDPB and molecular mechanics
[25,46,47] and Warshel’s microscopic methods (e.g.,
[16,17]). The predicted pK
a
s were benchmarked against
the experimental data and the average rmsd were found to
vary from the best value of 0.5pK [38], to 0.7pK [48], to
0.83pK [25] and to 0.89 [22]. Multi-Conformation Con-
tinuum Electrostatics (MCCE) [25] method was shown to
be among the best pK
a
s predictors and it will be
employed in this work.
In the present work we compute thepH dependence of
the free energy of folding and the net charge. The optimum
pH was identified as thepH at which the free energy of
folding has minimum. A large number of proteins having
different optimum pH [49] were studied to find the effect of
the amino acid composition and 3D structure on the
optimum pH.
Experimental procedures
Methods
Calculations were carried out using available 3D structures
of selected proteins. A text search was performed on
BRENDA database [49] in the field of ÔpH of stabilityÕ.Fol-
lowing search strings were used: Ômaximal stabilityÕ, Ômaxi-
mum stabilityÕ, Ôoptimal stabilityÕ, Ôoptimum stabilityÕ, Ôbest
stabilityÕ, Ôhighest stabilityÕ and Ôgreatest stabilityÕ.This
revealed 168 proteins with experimentally determined pHs
of maximal stability. Then a search oftheProtein Data Base
(PDB) was performed to find available structures for these
proteins. An attempt was made to select PDB structures of
proteins from the same species as those used in the
experiment (43 structures). Structures with missing residues
were omitted as well as the structures of proteins participa-
ting in large complexes resulting in the final set of 28 protein
structures. Theprotein names, the PDB file names and the
experimental pHofmaximalstability are provided in
Table 1. The source ofthe data is BRENDA database and
thus the present study is limited to the proteins listed there.
There will always be proteins with experimentally determined
Table 1. Proteins and corresponding PDB [57] files used in the paper. The experimental optimum pH (pH of optimal stability) is taken from
BRENDA website [49]. The calculated optimum pH (the pHofthe minimum of free energy of folding) is given in the forth column. The difference is
the calculated optimum pH minus the experimental number (fifth column). Bases/acid ratio for all ionizable groups is in sixth column, while the
seventh shows the bases/acids ratio for 66% buried groups. The last three columns show the averaged intrinsic pK shift, the averaged pK
a
shift and
the net charge ofthe folded protein at pH optimum, respectively.
Protein pdb code
Experimental
optimum
pH
Calculated
optimum
pH Difference
Base/acid
ratio
Buried
base/acid
ratio
Averaged
intrinsic
pK shift
Averaged
pK
a
shift
Net charge at
optimum pH
Dioxygenase 1b4u 8.0 8.0 0.0 0.94 1.33 0.08 ) 0.51 ) 3.0
Transferase 1f8x 6.5 5.0 ) 1.5 0.72 0.28 0.40 0.34 ) 5.5
Glutathione synthetase 1sga 8.0 7.5 ) 0.5 0.87 0.88 0.41 ) 0.58 ) 10.0
Isomerase 1b0z 6.0 6.0 0.0 1.02 0.90 0.05 ) 0.48 2.1
Coenzyme A 1bdo 6.5 7.0 0.5 0.67 1.50 0.22 0.03 ) 4.1
Dienelactone hydrolase 1din 7.0 6.5 ) 0.5 1.04 1.17 0.26 ) 0.36 ) 2.7
Dehydrogenase 1dpg 6.2 6.0 ) 0.2 0.79 1.05 0.38 ) 0.41 ) 13.0
Endothiapepsin 1gvx 4.15 4.0 ) 0.15 0.52 0.07 1.45 2.06 6.5
Dehydratase 1aw5 9.0 9.0 0.0 1.07 0.85 0.17 ) 0.48 ) 6.8
Cathepsin B 1huc 5.15 5.0 ) 0.15 0.90 0.73 1.28 0.11 5.8
Alginate lyase 1hv6 7.0 7.0 0.0 1.17 0.93 0.63 ) 0.72 2.7
Xylanase 1igo 5.5 6.5 1.0 1.41 1.00 0.60 ) 0.74 7.3
Hydrolase 1iun 7.5 7.0 ) 0.0 0.86 1.50 0.11 ) 1.15 ) 1.1
Aspartic protease 1j71 4.15 3.0 ) 1.15 0.54 0.33 0.98 1.32 9.4
Aldolase 1jcj 8.5 8.5 0.0 0.97 0.54 0.55 ) 0.19 ) 5.1
L
-Asparaginase 1jsl 8.5 7.0 ) 1.5 1.17 1.85 ) 0.12 ) 0.83 ) 0.1
Amylase 1lop 5.9 6.0 0.1 0.81 1.00 0.33 ) 0.42 ) 8.2
c-Glutamil hydrolase 1l9x 7.0 7.5 0.5 1.19 0.77 0.45 ) 0.02 2.8
Mutase 1m1b 7.0 6.0 ) 1.0 0.95 0.86 0.25 0.13 ) 3.2
Methapyrogatechase 1mpy 7.7 7.0 ) 0.7 1.0 1.33 0.11 ) 1.35 ) 12.0
Pyrovate oxidase 1pow 5.7 6.0 0.3 0.91 0.78 0.60 ) 0.51 ) 2.0
Chitosanase 1qgi 6.0 6.5 0.5 1.09 0.54 0.29 ) 0.31 5.0
Xylose isomerase 1qt1 8.0 8.0 0.0 0.84 1.50 0.24 ) 0.30 ) 16.0
Pyruvate decarboxylase 1zpd 6.0 7.0 1.0 1.02 0.83 0.47 ) 0.24 3.8
Acid a-amylase 2aaa 4.9 4.0 ) 0.90 0.51 0.64 1.53 1.48 ) 1.7
Formate dehydrogenase 2nac 5.6 7.0 1.40 1.11 1.42 0.06 ) 1.1 2.4
Phosphorylase 2tpt 6.0 5.0 ) 1.0 0.91 0.93 0.38 ) 0.34 ) 3.8
b-Amylase 5bca 5.5 5.0 ) 0.5 1.07 0.91 0.19 ) 0.13 15.1
174 E. Alexov (Eur. J. Biochem. 271) Ó FEBS 2003
optimum pH that were not in the database, and therefore are
not modeled in the paper. However, an additional four well
studied proteins were used to benchmark the method in
broad pH range and to compare the effect of mutations.
Free energy and net charge of unfolded state
The unfolded state is modeled as a chain of noninteracting
amino acids (the possibility of residual interactions in the
unfolded state is discussed at the end ofthe discussion
section). Thus, the free energy of ionizable groups (pH-
dependent free energy) is calculated as [31]:
DG
unf
¼ÀkT lnðZ
unf
Þ
¼ÀkT
X
N
iÀ1
lnf1 þ exp½À2:3cðiÞðpH À pK
sol
ðiÞÞg
ð1Þ
where k is the Boltzmann constant, T is the temperature in
Kelvin degrees, N is the number of ionizable groups, c(i)is1
for bases, )1foracids,pK
sol
(i) is the standard pK
a
value in
solution of group ÔiÕ (e.g., [47]), pH is thepHofthe solution
and N is the number of ionizable residues. Z
unf
is the
partition function of unfolded state and DG
unf
is the free
energy of unfolded state. The reference state of zero free
energy is defined as state of all groups in their neutral forms
[31].
The net charge is calculated using the standard formula
that comes from Henderson–Hasselbalch equation:
q
unf
¼
X
N
i¼1
10
ÀcðiÞðpHÀpK
sol
ðiÞÞ
1 þ 10
ÀcðiÞðpHÀpK
sol
ðiÞÞ
cðiÞð2Þ
where c(i) ¼ )1 or +1 in the case of acid or base,
respectively.
Free energy and net charge ofthe folded state
The pH-dependent free energy ofthe folded state is
calculated using the 3D structure of proteins listed in
Table 1. The 3D structure comprises N ionizable groups
(the same number as in the unfolded state) and L polar
groups. Each of them might have several alternative side-
chain rotamers [50], or alternative polar proton positions
[47]. In addition, ionizable groups are either ionized or
neutral. All these alternatives are called ÔconformersÕ,being
ionizational and positional conformers. There is no apriori
information to indicate which conformer is most likely to
exist at certain conditions of, for example, pH and salt
concentration. Each microstate is comprised of one con-
former per residue. The Monte Carlo method was used to
estimate the probability of microstates. This procedure
is called multi-conformation continuum electrostatics (MC
CE) and it is described in more details elsewhere [25,47,50]. A
brief summary ofthe MCCE method is provided in a later
section.
To find the free energy one should calculate the
partition function for each ofthe proteins. Thus, one
should construct all possible combinations of conformers.
Because ofthe very large number of conformers (most of
the cases more than 1000), the Monte Carlo method
(Metropolis algorithm [51]) is used to find the probability
of the microstates [20,47,50,52]. However, to construct the
partition function one should know all microstate energies
and to sum them up as exponents. Each microstate
energy should be taken only once, which induces extra
level of complexity. A special procedure is designed that
collects the lowest microstate energies and that assures
that each microstate is taken only once [50]. A microstate
was considered to be unique if its energy differs by more
than 0.001 kT from the energies of all previously
generated states. A much more stringent procedure that
compares the microstate composition would require
significant computation time and therefore was not
implemented. This results in a function that estimates
the partition function. This effective partition function
will not have the states with high energy (they are rejected
by the Metropolis algorithm), but they have negligible
effect [53]. In addition, the constructed partition function
may not have all low energy microstates, because given
microstate may not be generated in the Monte Carlo
sampling or because two or more distinctive microstates
may have identical or very similar energies. Bearing in
mind all these possibilities, the effective partition function
(Z
fol
)iscalculatedas[50]:
Z
fol
¼
X
X
fol
n¼1
expðÀDG
fol
n
=kTÞð3Þ
where DG
fol
n
is the energy ofthe microstate ÔnÕ and X
fol
is the
number of microstates collected in Monte Carlo procedure.
Then the free energy of ionizable and polar groups in folded
state is:
DG
fol
¼ÀkT lnðZ
fol
Þð4Þ
The occupancy of each conformer (q
fol
i
) [52] is calculated
in the Metropolis algorithm and then used to calculate the
net charge ofthe folded state:
q
fol
¼
X
M
i¼1
q
fol
i
cðiÞð5Þ
M is the total number of conformers. [Note that c(i)¼ 0 for
non ionizable conformers.]
Free energy of folding
The pH-dependent free energy of folding is calculated as a
difference between the free energy of folded and unfolded
states:
DDG
folding
¼ DG
fol
À DG
unf
ð6Þ
An alternative formula of calculating thepH dependence
of the free energy of folding is [19,31,54,55]:
DDG
folding
¼ 2:3kT
Z
pH
2
pH
1
DqdpH ð7Þ
where, pH
1
and pH
2
determine thepH interval and Dq is the
change ofthe net charge oftheprotein from unfolded to
folded state.
Ó FEBS 2003 Calculating pHofmaximalproteinstability (Eur. J. Biochem. 271) 175
Computational method: MCCE method
The basic principles ofthe method have been described
elsewhere [47,50]. The MCCE [25] method allows us to find
the equilibrated conformation and ionization states of
protein side chains, buried waters, ions, and ligands. The
method uses multiple preselected choices for atomic posi-
tions and ionization states for many selected side chains and
ligands. Then, electrostatic and nonelectrostatic energies
are calculated, providing look-up tables of conformer self-
energies and conformer–conformer pairwise interactions.
Protein microstates are then constructed by choosing one
conformer for each side chain and ligand. Monte Carlo
sampling then uses each microstate energy to find each
conformer’s probability.
Thus, the MCCE procedure is divided into three stages:
(a) selection of residues and generation of conformers; (b)
calculation of energies and (c) Monte Carlo sampling.
Selection of residues. The amino acids that are involved in
strong electrostatic interactions (magnitude > 3.5 kT) are
selected. They will be provided with extra side-chain
rotamers to reduce the effects of possible imperfections of
crystal structures. The reason is that a small change in their
position might cause a significant change in the pairwise
interactions [56]. The threshold of 3.5 kT is chosen based on
extensive modeling of structures and fitting to experiment-
ally determined quantities [25]. The selection is made by
calculating the electrostatic interactions using the ori-
ginal PDB [57] structure. The alternative side chains for
these selected residues are built using a standard library of
rotamers [58] and by adding an extra side chain position
using a procedure developed in the Honig’s laboratory [59].
The backbone is kept rigid. Then the original structure and
alternative side chains were provided with hydrogen atoms.
Polar protons ofthe side chains are assigned by satisfying all
hydrogen acceptors and avoiding all hydrogen donors [25].
Thus, every polar side chain and neutral forms of acids have
alternative polar proton positions.
Calculation of energies. The alternative side chains and
polar proton positions determine the conformational
space for a particular structure, and they are called
ÔconformersÕ. The next step is to compute the energies of
each conformer and to store them into look-up tables.
Because of conformation flexibility, the energy is no
longer only electrostatic in origin, but also has nonelec-
trostatic component [47,50].
Electrostatic energies are calculated by DelPhi [60,61],
using the PARSE [62] charge and radii set. Internal
dielectric constant is 4 [63], while the solution dielectric
constant is taken to be 80. The molecular surface is
generated with a water probe of radius 1.4 A
˚
[64]. Ionic
strength is 0.15
M
and the linear Poisson–Boltzmann
equation is used. Focusing technique [65] was employed to
achieve a grid resolution of about two grids per A
˚
ngstrom.
The M calculations, where M is the number of conformers,
produce a vector of length M for reaction field energy
DG
rxn,i
and an MxM array ofthe pairwise interactions
between all possible conformers DG
ij
el
. In addition, each
conformer has pairwise electrostatic interactions with the
backbone resulting in a vector of length M DG
pol,i
.The
magnitude ofthe strong pairwise and backbone interactions
is altered as described in [56]. Such a correction was
shown to improve significantly the accuracy ofthe calcu-
lated pK
a
s[25].
Having alternative side chains and polar hydrogen
positions requires nonelectrostatic energy to be taken into
account too. This energy is a constant in calculations that
use a ÔrigidÕ protein structure (and therefore should not be
calculated), but in MCCE plays important role discrim-
inating alternative positional conformers. The non-
electrostatic interactions for each conformer are the
torsion energy, a self-energy term which is independent
of the position of all other residues in the protein, and
the pairwise Lennard–Jones interactions, both with por-
tions oftheprotein that are held rigid, and with
conformers of side chains that have different allowed posi-
tions [25,47,50].
Thus, the microstate ÔnÕ pH-dependent free energy of
folded state is [20,21,47,50]:
DG
fol
n
¼
X
M
i¼1
2:3kTd
n
ðiÞ½cðiÞðpH À pK
sol
ðiÞÞ þ DpK
int
ÞðiÞ
þ
X
M
j¼iþ1
d
n
ðiÞd
n
ðjÞðG
ij
el
þ G
ij
nonel
Þ
;
DpK
int
ðiÞ¼DpK
solv
ðiÞþDpK
dip
ðiÞþDpK
nonel
ðiÞ
ð8Þ
where d
n
(i)is1ifith conformer is present in the nth
microstate, M is the total number of conformers, DpK
int
(i)
is the electrostatic and non electrostatic permanent energy
contribution to the energy of conformer ÔiÕ (note that it does
not contain interactions with polar groups), c(i)is1for
bases, )1 for acids, and 0 for neutral groups, DpK
solv
(i)isthe
change of solvation energy of group ÔiÕ, DpK
dip
(i)isthe
electrostatic interactions with permanent charges,
DpK
nonel
(i) is the nonelectrostatic energy with the rigid part
of protein, G
ij
el
and G
ij
nonel
are the pairwise electrostatic and
non electrostatic interactions, respectively, between con-
former ÔiÕ and ÔjÕ.
Monte Carlo sampling. TheMonteCarloalgorithmis
used to estimate the occupancy (the probability) of each
conformer at given pH. The convergence is considered
successful if the average fluctuation ofthe occupancy is
smaller than 0.01 [25]. ThepH where the net charge of given
titratable group is 0.5 is pK
½
. To adopt a common
nomenclature, pK
½
will be referred as pK
a
throughout the
text.
Optimum pH, isoelectric point (pI) and bases/acids ratio
The experimental pHofmaximalstability for each of the
proteins listed in Table 1 is taken from the website
BRENDA [49]. The database does not always provide a
single number for the optimum pH. If given protein is
reported to be stable in a range of pHs, then the optimum
pH is taken to be the middle ofthepH range.
The optimum pH in thenumerical calculation is deter-
mined as pH at which the free energy of folding has
minimum. In the case that the free energy of folding has a
176 E. Alexov (Eur. J. Biochem. 271) Ó FEBS 2003
minimum in a pH interval, the optimum pH is the middle of
the interval. Thecalculations were carried out in steps of
DpH ¼ 1. Thus, the computational resolution of determin-
ing thepH optimum was 0.5 pH units.
The calculated and experimental pH intervals were not
compared, because in many cases BRENDA database
provides only thepHof optimal stability. In addition, in
most cases the experimental pH interval ofstability given in
the BRENDA database does not provide information for
the free energy change that theprotein can tolerate and still
be stable. Therefore it cannot be compared with the
numerical results which provide only thepH dependence
of the folding free energy. Some proteins may tolerate a
free energy change of 10 kcalÆmol
)1
and still be stable, while
others became unstable upon a change of only a few
kcalÆmol
)1
.
The calculated isoelectric point (pI) is thepH at which
the net charge of folded state is equal to zero. There is
practically no experimental data for the pI ofthe proteins
listed in Table 1. The net charge at optimum pH is the
calculated net charge ofthe folded protein at pH
optimum. Base/acid ratio was calculated by counting all
Asp and Glu residues as acids and all Arg, Lys and His
residues as bases. In some cases, one or more acidic and/
or His residues was calculated to be neutral at a particular
pH optimum, but they were still counted. The reason for
this was to avoid the bias ofthe 3D structure and to
calculate the base/acid ratio purely from the sequence.
The given residue is counted as 66% buried if its
solvent accessible surface (SAS) is one-third ofthe SAS
in solution. Averaged intrinsic pK shifts were calculated
as
1
N
X
N
i¼1
ðpK
int
ðiÞÀpK
sol
ðiÞÞ
and the averaged pK
a
sshiftas
1
N
X
N
i¼1
ðpK
a
ðiÞÀpK
sol
ðiÞÞ
Thus, a negative pK shift corresponds to conditions such
that theprotein stabilizes acids and destabilizes bases and
vice versa. Arginines were not included in the calculations
because their pK
a
s are calculated in many cases to be
outside the calculated pH range.
Results
Origin of optimum pH
The paper reports thepH dependence ofthe free energy of
folding. Despite the differences among the calculated
proteins, the results show that the pH-dependence profile
of the free energy of folding is approximately bell-shaped
and has a minimum at a certain pH, referred to through the
paper as the optimum pH.
To better understand the origin ofthe optimum pH, a
particular case will be considered in details. Figure 1A
shows the free energies of cathepsin B calculated in pH
range 0–14. Three energies were computed: the free energy
of the unfolded state (bottom line), the free energy of the
folded state (middle line) and the free energy of folding (top
curve). For the sake of convenience the free energies of the
folded state and folding are scaled by an additive constants
so to have the same magnitude as the free energy of the
unfolded state at thepHofthe extreme value (in this case
pH ¼ 5). It improves the resolution ofthe graph without
changing its interpretation, because the energies contain an
undetermined constant (hydrophobic interactions, entropy
change, van der Waals interactions and other pH-inde-
pendent energies).
Free energy of unfolded state. It can be seen (Fig. 1A) that
the free energy ofthe unfolded state has a maximum value
at pH ¼ 5 and it rapidly decreases at low and high pHs.
Such a behavior can be easily understood given equation 1.
At low pH, the pK
sol
of all acidic groups is higher than the
current pH and thus they contribute negligible to the
partition function. In contrast, all basic groups contribute
significantly to the partition function. As thepH decreases,
their contribution increases, making the free energy more
negative. At medium pHs, all ionizable groups are ionized
(except His and Tyr), but their effect on the free energy is
quite small, because their pK
sol
areclosetothepH.This
results in a maximum ofthe free energy corresponding to
the least favorable state. At high pHs, the situation is
reversed: all acidic groups have a major contribution to the
partition function, while bases add very little. Thus, the free
energy profile ofthe unfolded state is always a smooth curve
(bell-shaped) with a maximum at a certain pH. The shape of
the curve and the position ofthe maximum depend entirely
upon the amino acid composition.
Fig. 1. Cathepsin B pH-dependent properties.
(A) Free energy; (B) net charge.
Ó FEBS 2003 Calculating pHofmaximalproteinstability (Eur. J. Biochem. 271) 177
Free energy of folded state. Thefreeenergyofthefolded
state behaves in a similar manner, but it changes less with
the pH (Fig. 1A). Note that it has maximum at pH ¼ 6.
The major difference occurs at low and high pHs where free
energy ofthe folded state does not decrease as fast as for the
unfolded state. The 3D structure adds to the microstate
energy (Eqn 8) and to the partition function several new
energy terms )DpK
int
(i) (that originates in part from the
desolvation energy) and pairwise interactions G
ij
(a detailed
discussion on the effect of desolvation and pairwise energies
on thestability is given in [31]). If these two terms
compensate each other, then Eqn 8 might be thought to
reassemble the microstate energy formula ofthe unfolded
state, Eqn 1. But there is an important difference: the amino
acids are coupled through the pairwise interactions. The
pairwise energies are a function ofthe ionization states.
Thus, the de-ionization of a given group will cancel its
pairwise interaction energies with the rest ofthe protein.
The effect ofthe coupling can be easily understood at the
extremes of pH. Consider a very low pH such that the pK
a
s
of all acidic groups are higher than the current pH. At such
pH all acids will be fully protonated and thus the bases
(having their own desolvation penalty) will be left without
favorable interactions. Thus the energy ofthe folded state
will be less favorable (because ofthe desolvation energy and
the lack on favorable interactions) than the energy of
unfolded state.
Free energy of folding. ThepH dependence ofthe free
energy of folding results from the difference ofthe above
free energies (Fig. 1A). It always will have a minimum at
certain pH (in principle it might have more than one
minimum). This minimum may or may not coincide with
the pH where the unfolded free energy has maximum. The
folding free energy always has a bell shape, and it is
unfavorable at low and high pHs as compared to the free
energy at optimum pH.
Net charge. An alternative way of addressing the same
question is to compute the net charge ofthe protein
(Fig. 1B). One can see that at the extremes of pH, the
protein is highly charged. At low pH it has a huge net
positive charge and at high pH a huge net negative charge.
A straightforward conclusion could be made that acidic/
basic denaturation is caused by the repulsion forces among
charges with the same type. However all these positive
chargesatlowpHexistalsoatmediumpH,wherethe
proteins are stable. The thing that is missing at low pH and
causes acid denaturation is the favorable interactions with
negatively charged groups. At low pH, bases are left without
the support of acids, and they have to pay an energy penalty
for their desolvation and unfavorable pairwise energies
among themselves.
Equation 7 provides an additional tool for determining
the optimum pH. At the optimum pH, the curve of folding
free energy must have an extremum, i.e. the curve must
invert its pH behavior. At pH lower than the optimum pH,
the free energy of folding should decrease with increasing
the pH, then it should have a minimum at pH equal to the
optimum pH, and then it should increase with further
increase ofthe pH. Such behavior corresponds to a negative
net charge difference between the folded and unfolded state
at pH smaller than the optimum pH. As pH increases, the
net charge difference should get smaller, and at the optimum
pH, it should be zero. Further increase ofthepH (above the
optimum pH) should make the net charge difference a
positive number. One can see in Fig. 1B that the net charge
of folding follows such pattern and is zero at pH ¼ 5, where
the free energy of folding has a minimum.
General analysis ofthe optimum pH
Comparison to experimental data. Although this paper
focuses on thepHofmaximal stability, it is useful to
compare the calculated pH dependence ofthe folding free
energy on a set of proteins subjected to extensive experi-
mental measurements. Figure 2 plots the calculated and
experimental pH dependence ofthe free energy of folding.
The experimental data is taken from Fersht [66,67],
Robertson [68] and Pace [10]. One can see that the
calculated pH-dependent free energy agrees well with
the experimental data. The most important conclusion for
the aims ofthe paper is that the calculated pH dependence
profile ofthe free energy of folding is similar to that of the
experiment. The only exception is ribonuclease A where
the calculated pH optimum is 8 while the experiment finds
the best stability at pH ¼ 6. It should be noted that the
calculated results are similar to the results reported by
Elcock [33] and Zhou [36] in cases of idealized unfolded
state. From the works ofthe above authors, as well as from
Karshikoff laboratory [34], one can see that the residual
interactions in unfolded state do not affect thepH optimum
in majority ofthe studied cases.
An additional possibility for comparison is offered by the
mutant data. Table 2 shows thestability change of barnase
caused by mutations of charged residues. The calculated
numbers are the pK
a
shifts (in respect to the standard pK
sol
)
of each of these ionizable residues. Thus, the energy of the
mutant residue is not taken into account in the numerical
calculations. Even under such simplification, the calculated
numbers are 0.84 kcalÆmol
)1
rmsd from the experiment.
Figure 3 compares the calculated optimum pH vs.
experimental optimum pH for 28 proteins listed in Table 1.
One can see that calculated values are in good agreement
with experimental data. The slope ofthe fitting line is 0.93
and Pearson correlation coefficient is 0.86. The rmsd
between calculated and experimentally determined opti-
mum pHs is 0.73. The optimum pH ranges from 2 to 9 (4–9
experimentally) which provides a broad range of pHs to be
compared.
The origin ofthe optimum pH. The position of the
optimum pH depends on the amino acid composition and
on the organization ofthe amino acids within the 3D
structure. To find which of these two factors dominates we
plotted the calculated optimum pHofthe free energy of
folding vs. thepH at which the free energy of unfolded state
has maximum (Fig. 4). The free energy of folding results
from the difference ofthe free energy of folded and unfolded
states. Thus, if the last two energies have the same pH
dependence, the free energy of folding will be pH independ-
ent. If both the free energy of unfolded and of folded state
have similar shape and maximum at the same pH, then most
likely the optimum pH will also be at this pH. If the curve of
178 E. Alexov (Eur. J. Biochem. 271) Ó FEBS 2003
the free energy ofthe folded state is steeper at basic pHs (or
flatter at acidic pHs) compared to the free energy of the
unfolded state, then the difference, i.e. the free energy of
folding will have optimum pH shifted to the right pH scale.
Such a phenomenon will occur if theprotein stabilizes acids.
Then the optimum pH will be higher than thepH of
maximal free energy of unfolded state (points above the
Table 2. Experimental and calculated effect of single mutants on the
stability of barnase.
Mutant Experiment (kcalÆmol
)1
) Calculation (kcalÆmol
)1
)
D12A ) 0.95 ) 1.83
R69S, R69M ) 2.67, ) 2.24 ) 1.9
D75N ) 4.51 ) 2.92
R83Q ) 2.23 ) 4.07
D93N ) 4.17 ) 4.27
R110A ) 0.45 ) 2.17
Fig. 2. The calculated pH dependence of the
free energy of folding (solid line) and experi-
mental data (d). The ionic strength was
selected to match experimental conditions:
barnase (I ¼ 50 m
M
), OMKTY3
(I ¼ 10 m
M
), CI2 (I ¼ 50 m
M
) and ribonuc-
lease A (I ¼ 30 m
M
).
Fig. 3. The calculated optimum pH vs. the experimental optimum pH.
The figure shows only 27 data points, because the calculated and
experimental data for 1b4u and 1qt1 overlap.
Fig. 4. The calculated optimum pH vs. thepHofmaximal free energy of
unfolded state. Only 19 points can be seen in the figure, because of an
overlap, but all 28 points are taken into account in the calculation of
the correlation coefficient.
Ó FEBS 2003 Calculating pHofmaximalproteinstability (Eur. J. Biochem. 271) 179
diagonal). If theprotein stabilizes bases (or destabilizes
acids), then the optimum pH is lower than thepH of
maximum ofthe free energy of unfolded state (point below
the diagonal). The points lying on the diagonal represent
cases for which the amino acid sequence dominates in
determining the optimum pH. The points below the
diagonal show proteins with pH optimum lower than the
pH of maximum ofthe free energy of unfolded state. The
points offset from the diagonal manifest the importance of
the 3D structure. In each case where the 3D structure causes
a shift ofthe solution pK
a
of ionizable groups, the stability
changes [31,69]. If protein favors the charges, then the
stability increases. From 28 proteins studied in the paper,
nine lie on the main diagonal (tolerance 0.5pK units), while
19 are offset by more than of 0.5pK units. Thus, in 32% of
the cases the amino acid composition is the dominant factor
determining the optimum pH and in 68% ofthe cases, the
3D structure does.
To check for possible correlation between the optimum
pH and the pK shifts in respect to the standard pK
sol
,they
were plotted in Fig. 5. Two pK shifts were calculated:
intrinsic pK which does not account for the interactions
with ionizable and polar groups, and pK
a
shift which
reflects the total energy change from solution to the protein
for each ionizable group. In both cases the correlation with
pH optimum exists, although the correlation coefficients are
not very good. A positive pK shift corresponds to pK of
acids and bases bigger that of model compounds and thus to
electrostatic environment that disfavors acids and favors
bases. The most acidic enzymes were found to use this
strategy to lower their optimum pH (see the most right hand
side ofthe Fig. 5). The most basic enzymes induce slight
positive shift ofthe intrinsic pK, but adding the pairwise
interactions turns the pK shift to a negative number. The
enzymes between these two extremes do not induce large pK
shift on average.
It is well known that thepH dependence ofthe free
energy is an integral ofthe net charge difference between
folded and unfolded states over a particular pH interval
(Equation 7) [31,55,70]. A negative net charge difference
corresponds to a negative change ofthe free energy (the free
energy gets more favorable as pH increases). Thus, if an acid
has a pK
a
lower than the standard pK
sol
, it will titrate at
lower pH in the folded state compared to unfolded. As a
result, such a group will contribute to the net charge
difference by a negative number. Conversely, a positive net
charge difference corresponds to a positive free energy
change, i.e. to a less favorable free energy of folding. This
corresponds to pK
a
s higher than the standard pK
sol
.At
optimum pHthe net charge difference should be zero. At
very low and at very high pHs, the free energy of folding is
unfavorable, because either bases or acids are left without
the support ofthe contra partners. Between these two
extremes, the free energy of folding must have a minimum.
Starting from very low pH to high pH, the first several
ionization events will be the deprotonation of acids. Because
these few acids are in the environment ofthe positive
potential of bases, they have pK
a
s lower than of unfolded
state and thus, the net charge difference between folded and
unfolded states will be negative. Thus, the free energy of
folding will decrease. If theprotein does not support the
acids, then the rest of acids will have pK
a
s higher than that
of the unfolded state. This results to a positive net charge
difference between the folded and unfolded state and
increases the free energy of folding. Thus, the optimum
pH will be at low pH. Conversely, if theprotein favors the
acids, then most of them will have pK
a
s lower than of
unfolded state and the net charge difference between folded
and unfolded states will be negative. Thus, the free energy of
folding will keep decreasing with increasing pH. This will
result in optimum pH shifted to higher pHs.
The optimum pH is not uniquely determined by the ratio
of basic to acidic groups. Figure 6A demonstrates that
enzymes with quite different bases to acids ratio have similar
optimum pH and that proteins with similar bases to acids
ratio function at completely different pHs. At the same time,
the trend is clearly seen. The proteins that function at low
pH have fewer bases (low base to acid ratio), while the
enzyme working at high pH have more bases than acids (see
also Table 2). The Pearson correlation coefficient is less
than 0.4, which demonstrates that the base/acid ratio is not
the most important factor in determining the optimum pH.
However, restricting the counting to buried amino acids
only, one finds much better correlation (Fig. 6B). This
improvement suggests that thepH optimum is mostly
determined by the buried charged groups, but the correla-
tion is still weak.
The effect ofthe net charge on thestabilityof the
proteins is demonstrated in Fig. 7A,B, where the optimum
pH is plotted against the calculated isoelectric point (pI)
and the net charge at optimum pH. At the isoelectric
point the net charge oftheprotein is zero, i.e. there are
equal number negative and positive charges. The graph
shows that there is no correlation (Pearson coeffi-
cient ¼ 0.09) between the isoelectric point and the opti-
mumpH.Atthesametime,thecorrelationbetweenthe
Fig. 5. The experimental optimum pH vs. the
averaged pK shifts. (A) Averaged intrinsic pK
a
;
(B) averaged pK
a
s shift.
180 E. Alexov (Eur. J. Biochem. 271) Ó FEBS 2003
optimum pH and the net charge of folded state is not
neglectable. The signal is weak, but there is a clear
tendency for proteins with acidic optimum pH to be
positively charged and for proteins with basic optimum
pH to carry negative net charge. There are only a few
proteins which do not have net charge at optimum pH.
Discussion
The study has shown that thepHofmaximalstability can
be calculated using the 3D structure of proteins. Twenty-
eight different proteins were studied, most of them with
undetectable sequence and structural similarity. The opti-
mum pH varies from very acidic pH to very basic pH. Such
a diversity provided a good test for the computational
method (MCCE) used in the study. Relatively good
agreement with the experimental data was achieved result-
ing to correlation of 0.85 and rmsd ¼ 0.73. At the same
time, as indicated in Fig. 3, there are three proteins with
calculated optimum pHof about 1.5 pK units offset from
the experimental value (see Table 1). The reason for such a
discrepancy could be conformation changes that are not
included in the model. In addition, all calculations were
carried out at physiological salt concentration (I ¼ 0.15
M
),
while the experimental conditions of measuring the opti-
mum pH in many cases are not available. This may or may
not be a source of significant error, because although the salt
concentration strongly affects the pK
a
values in proteins
[71,72] and in model compounds [73], it may not necessary
affect the optimum pH [74]. At the same time, it is
interesting to point out that the average rmsd of calculated
to experimental pH optimum is 0.73, which is similar and
slightly better than the average rmsd of pK
a
s calculations
[25].
Two major factors determine the optimum pH, amino
acid composition and 3D structure ofthe proteins. The
relative importance of these two factors varies among the
proteins. To test our conclusions, two proteins that have
different optimum pH (acidic and basic) and are structurally
superimposable will be discussed below.
Figure 8A shows a structural alignment of acid
a-amylase (pdb code 2aaa) and xylose isomerase (pdb code
1qt1). The first protein has acidic optimum pH (calculated
optimum pH ¼ 4, experimental optimum pH ¼ 4.9), while
the second has basic optimum pH (calculated and experi-
mental optimum pH ¼ 8). The core structures of the
proteins are well aligned (rmsd ¼ 5.0 A
˚
and PSD ¼ 1.47
[75]). The part ofthe sequence alignment generated from the
structural superimposition is shown in Fig. 8B. The posi-
tions that correspond to Arg or Lys residues in the xylose
isomerase sequence and are aligned to nonbasic groups in
acid a-amylase sequence are highlighted. One can see that
31 basic groups of xylose isomerase sequence are replaced
by negative, polar or neutral groups in acid a-amylase
sequence. There are only a few examples ofthe opposite
case that are not shown in the figure. This results to base/
acid ratio of 0.51 for acid a-amylase and 0.84 for xylose
isomerase. This difference in the amino acid composition
results in a different pH dependence ofthe free energy of the
unfolded state and thus demonstrates the effect ofthe amino
acid composition on the optimum pH. From a structural
point of view it is interesting to mention that most of the
Fig. 7. The experimental optimum pH vs. the
calculated isoelectric point (A) and the net
charge at pH optimum (B).
Fig. 6. The experimental optimum pH vs. the
ratio of bases/acids. Twenty-seven data points
can be seen, because ofthe overlap between
1qtl and 1b4u. (A) All amino acids; (B) buried
amino acids.
Ó FEBS 2003 Calculating pHofmaximalproteinstability (Eur. J. Biochem. 271) 181
extra basic groups within the xylose isomerase structure are
not within the extra loop regions, but rather within the core
structure (see Fig. 8A). This confirms the observation
(Fig. 7B) that buried groups affect the optimum pH and
an enzyme that has acidic optimum pH has low acid/base
ratio. It remains to be shown that this is a general behavior
of all enzymes operating at low pH.
Three-dimensional structure oftheprotein plays an even
more significant role than the sequence composition on the
optimum pH (68% ofthe cases in this work). The ability of
Fig. 8. Alignment of acid alpha-amylase
(2aaa.pdb) and xylose isomerase (1qt1.pdb).
(A) Structural and sequence alignments are
carried out with
GRASP
2 [79]. Structural
alignment in ribbon representation: acid
amylase backbone is shown in green and
xylose isomerase in blue. The red patches
show the positions of substitution of Arg/Lys
to negative, polar or neutral groups from
xylose isomerase to acid amylase (see Fig. 8B).
(B) Sequence alignment from the structural
superimposition: highlighted are the positions
at which Arg/Lys in the xylose isomerase
sequence are aligned to acid, polar or neutral
groupinacida-amylase sequence.
182 E. Alexov (Eur. J. Biochem. 271) Ó FEBS 2003
[...]... engineering the surface charges of ribonuclease Sa [12] Increasing the net charge ofthe molecule does not change its pHofmaximal stability, but changes the isoelectric point and increases solubility [12] Another strategy used to reduce the bias from the amino acid composition is to change pKas of ionizable groups in theprotein If protein favors the negative charges on acidic groups, then the optimum pH. ..Ó FEBS 2003 Calculating pHofmaximalproteinstability (Eur J Biochem 271) 183 the proteins to reduce the bias ofthe amino acid sequence composition was shown by comparing the isoelectric point, the net charge and the optimum pH It was shown that for most proteins the optimum pH does not coincide with the pI and that theprotein is most stable when it caries net charge This... pH- dependence curve is sensitive to the model ofthe unfolded state, the optimum pH does not depend significantly on it [33–36] The success ofthe modeling ofthepH dependent free energy of folding critically depends ofthe accuracy ofthe calculated pKas ofthe ionizable groups Recent benchmarks of MCCE on 166 titratable groups resulted to an rmsd 0.83 pK as compared to the experimentally determined pKas... [10].) The modeling ofthe unfolded state would eventually require molecular dynamic runs [33] or some assumptions ofthe organization ofthe amino acids in unfolded state [34,36] or even an experimental determination ofthe pKas in model compounds [35,73] Our goal was to compute thepH at which the free energy of folding has minimum It was shown in the literature that while the shape ofthe pH- dependence... ionizable groups by theprotein always increases proteinstability It should be emphasized that this paper does not make an attempt to calculate the all ofthe details ofpH dependence ofthe free energy of denaturation This will require an appropriate model ofthe unfolded state [7,66], which is believed to be compact and native-like (In addition, the denaturated state may not be the same in thermal, urea... (2000) pH dependence ofstabilityof staphyloccocal nuclease: evidence of substantial electrostatic interactions in the denaturated state Biochemistry 39, 14292–14304 8 Pots, A., Jongh, H., Gruppen, H., Hessing, M & Voragen, A (1998) ThepH dependence ofthe structural stabilityof patatin J Agric Food Chem 46, 2546–2553 9 Khurana, R., Hate, A., Nath, U & Udgaonkar, B (1995) pH dependence ofthe stability. .. modeling ofthe denaturated states of proteins allows accurate calculationsofthepH dependence ofproteinstability J Mol Biol 294, 1051–1062 34 Kundrotas, P & Karshikoff, A (2002) Modeling of denaturated state for calculation ofthe electrostatic contribution to proteinstability Prot Sci 11, 1681–1686 35 Tollinger, M., Crowhurst, K., Kay, L & Forman-Kay, J (2003) Site-specific contributions to thepH dependence... pH as compared to thepH at which unfolded free energy has maximum and vice versa (Fig 5) The same is valid for basic groups but the effect is less noticeable simply because their pKas are too high (except for histidines) It should be emphasized that one should distinguish between the amplitude ofthe free energy of folding and optimum pH As discussed in previous papers [31,69], the stabilization of. .. made to study the sensitivity ofthe results against different values ofthe dielectric constant Other parameters that were not tested include the charge set [76], the choice of molecular surface (van der Waals surface vs molecular surface) [56,77,78] and the effect of energy minimization of PDB structures [26] These will require a separate study In addition, it should be noted that the relatively... assumes that pKas oftheprotein as the same as in model compounds) will not work in this case, because it will result in pH- independent free energy of folding Despite of several failures, the presented methodology can predict the optimum pH with reasonable accuracy This information can be used to identify a possible cellular compartment or body organ where theprotein may function Obviously a protein with . range of pHs, then the optimum
pH is taken to be the middle of the pH range.
The optimum pH in the numerical calculation is deter-
mined as pH at which the. was determined in
the numerical calculations as the pH of the minimum free
energy of folding. The experimental data for the pH of
maximal stability (experimental