Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 24 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
24
Dung lượng
3,2 MB
Nội dung
REVIEW ARTICLE
The carbohydrate-bindingmodulefamily20– diversity,
structure, and function
Camilla Christiansen
1,2,3
, Maher Abou Hachem
2
,S
ˇ
tefan Janec
ˇ
ek
4,5
, Anders Viksø-Nielsen
3
,
Andreas Blennow
1
and Birte Svensson
2
1 VKR Research Centre Pro-Active Plants, Department of Plant Biology and Biotechnology, Faculty of Life Sciences, University of
Copenhagen, Frederiksberg, Denmark
2 Enzyme and Protein Chemistry, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
3 Novozymes A ⁄ S, Bagsvaerd, Denmark
4 Institute of Molecular Biology, Slovak Academy of Sciences, Bratislava, Slovakia
5 Department of Biotechnology, Faculty of Natural Sciences, University of SS Cyril and Methodius, Trnava, Slovakia
Keywords
a-glucan; amylolytic enzymes; glucan;
molecular recognition; starch-binding
domain; starch metabolism; water dikinase
Correspondence
M. Abou Hachem, Enzyme and Protein
Chemistry, Department of Systems Biology,
Technical University of Denmark, Søltofts
Plads, Building 224, DK-2800 Kgs. Lyngby,
Denmark
Fax: +45 4588 6307
Tel: +45 4525 2732
E-mail: maha@bio.dtu.dk
(Received 22 May 2009, accepted 17 July
2009)
doi:10.1111/j.1742-4658.2009.07221.x
Starch-active enzymes often possess starch-binding domains (SBDs) medi-
ating attachment to starch granules and other high molecular weight sub-
strates. SBDs are divided into nine carbohydrate-bindingmodule (CBM)
families, and CBM20 is the earliest-assigned and best characterized family.
High diversity characterizes CBM20s, which occur in starch-active glyco-
side hydrolase families 13, 14, 15, and 77, and enzymes involved in starch
or glycogen metabolism, exemplified by the starch-phosphorylating enzyme
glucan, water dikinase 3 from Arabidopsis thaliana andthe mammalian
glycogen phosphatases, laforins. The clear evolutionary relatedness of
CBM20s to CBM21s, CBM48s and CBM53s suggests a common clan host-
ing most of the known SBDs. This review surveys the diversity within the
CBM20 family, and makes an evolutionary comparison with CBM21s,
CBM48s and CBM53s, discussing intrafamily and interfamily relationships.
Data on binding to and enzymatic activity towards soluble ligands and
starch granules are summarized for wild-type, mutant and chimeric fusion
proteins involving CBM20s. Noticeably, whereas CBM20s in amylolytic
enzymes confer moderate binding affinities, with dissociation constants in
the low micromolar range for the starch mimic b-cyclodextrin, recent find-
ings indicate that CBM20s in regulatory enzymes have weaker, low milli-
molar affinities, presumably facilitating dynamic regulation. Structures of
CBM20s, including the first example of a full-length glucoamylase featuring
both the catalytic domain andthe SBD, are summarized, and distinct
architectural and functional features of the two SBDs and roles of pivotal
amino acids in binding are described. Finally, some applications of SBDs
as affinity or immobilization tags and, recently, in biofuel and in planta
bioengineering are presented.
Abbreviations
AMPK, AMP-activated protein kinase; CBM, carbohydrate-binding module; CGTase, cyclodextrin glucanotransferase; DP, degree of
polymerization; GA, glucoamylase; GH, glycoside hydrolase; GWD3, glucan, water dikinase 3; ITC, isothermal titration calorimetry; SBD,
starch-binding domain; SEX4, starch excess 4 protein.
5006 FEBS Journal 276 (2009) 5006–5029 ª 2009 The Authors Journal compilation ª 2009 FEBS
Introduction
Starch and glycogen – structure and enzymatic
degradability
Starch is the major energy reserve in plants and the
most important energy source in the human diet. The
starch granule is a complex structure composed of two
distinct glucose polymers: amylose, comprising essen-
tially unbranched a-(1 fi 4)-linked glucose residues,
and the larger and branched amylopectin, produced by
the formation of a-(1 fi 6) linkages between adjoining
straight glucan chains on an a-(1 fi 4) backbone. In
starch granules, amylopectin and amylose molecules are
organized as alternating semicrystalline and amorphous
layers forming radial growth rings [1,2]. Whereas little
is known about the structure of the amorphous layers,
the semicrystalline layers are made of short linear amy-
lopectin chains packed together as parallel left-handed
double helices. These helical segments extend from glu-
cosidic branch points, and are further packed into con-
centric arrays known as crystalline lamellae [3].
Amorphous regions of the semicrystalline layers and the
amorphous layers are composed of amylose and nonor-
dered amylopectin branch chains.
The enzymatic degradation of starch and other
insoluble polysaccharides poses a considerable chal-
lenge to the attacking enzymes, as the polysaccharide
chains are often poorly accessible to the active sites.
The degradation, moreover, involves mass transfer in a
two-phase system comprising the bulk of the medium
and the surface of starch granules. Finally, despite the
common structural features, starch granules show
remarkable morphological variation, depending on
botanical origin and tissue or compartmentalization
[4], and such differences are also important for the
degradability of the starch granule [3].
From a chemical viewpoint, starch and glycogen are
very similar, but they differ considerably in their
molecular fine structure, physical properties, and sus-
ceptibility to enzymatic degradation. Glycogen does
not crystallize in water and has a higher degree of
branching than starch, making fast enzymatic degrada-
tion possible and providing short-term energy for rapid
metabolic needs [5]. However, the chemical similarity
between starch and glycogen results in overlapping
molecular recognition by enzymes or binding proteins
targeting the two polymers.
Starch recognition and starch-binding domains (SBDs)
Microbial extracellular hydrolytic enzymes that cata-
lyse the degradation of starch granules or plant cell
walls typically possess a modular architecture, and
very often contain noncatalytic ancillary domains
referred to as carbohydrate-binding modules (CBMs),
which target cognate catalytic modules to specific poly-
saccharide structures. Binding of enzymes to insoluble
polysaccharide surfaces such as starch granules or crys-
talline cellulose is enhanced by CBMs, which have also
been suggested to distort the conformation and pack-
ing of the polymers, thereby facilitating their degrada-
tion. CBMs with affinity for starch are commonly
referred to as SBDs. The first discovered SBDs were
placed in the CBM20 family, which remains the best
characterized SBD family to date. Thus, starch binding
was described for CBM20s of, for example, glucoamy-
lase (GA, EC 3.2.1.3) from Aspergillus niger [6–8],
cyclodextrin glucanotransferase (CGTase, EC 2.4.1.19)
from Bacillus circulans strain 251 [9,10], maltogenic
a-amylase (EC 3.2.1.133) from Geobacillus stearo-
thermophilus [11], and b-amylase (EC 3.2.1.2) from
Bacillus cereus var. mycoides [12,13].
Whereas the early CBM20s were found in extracellu-
lar starch hydrolases secreted by fungi or bacteria [14–
16], an increasing number of newly reported CBM20s
show diversity with respect to phylogenetic origin and
function of the appended catalytic modules. CBM20s
thus occur in a wide spectrum of secreted or intracellu-
lar, amylolytic and nonamylolytic enzymes from plants,
mammals, archaeans, bacteria, and fungi. A common
feature of CBM20s is that they are joined to catalytic
modules associated with starch or glycogen meta-
bolism. Recent evidence indicates that a few of the
enzymes possessing CBM20s have regulatory functions
[17,18]. Remarkably, the affinity of these regulatory
enzymes for starch seems to be relatively low, suggest-
ing a much more dynamic role of their SBDs than for
those found in extracellular starch-degrading enzymes
[19]. This review describes the current knowledge on
SBDs, exemplified by well-characterized CBM20s, and
follows up on recent revelations regarding SBDs with
plausibly novel physiological functions that are differ-
ent from those reported earlier for CBM20s and the
related CBM21s, CBM48s, and CBM53s.
Classification
By analogy with glycoside hydrolases (GHs), CBMs
are categorized into families based on amino acid
sequence similarities, and members of each family
share a common structural fold [20], but not necessar-
ily specificity. The generic term CBM refers to noncat-
alytic carbohydrate-binding domains, which are
C. Christiansen et al. Carbohydrate-bindingmodulefamily 20
FEBS Journal 276 (2009) 5006–5029 ª 2009 The Authors Journal compilation ª 2009 FEBS 5007
currently grouped in 54 different families (see http://
www.cazy.org/fam/acc_CBM.html). The first classified
noncatalytic binding domains were originally defined
as cellulose-binding domains, because their specificity
was towards crystalline cellulose [21,22]. Similar obser-
vations were made for other polysaccharide-degrading
enzymes, such as plant chitinases (EC 3.2.1.14) [23,24],
and a new term, carbohydrate-binding module, was
introduced to reflect the diverse polysaccharide speci-
ficity [20,25,26]. A CBM is defined as a contiguous
amino acid sequence from a carbohydrate-active
enzyme that folds as a separate domain and shows
carbohydrate-binding ability.
Currently, nine CBM families, 20, 21, 25, 26, 34, 41,
45, 48, and 53, have been reported to contain SBDs.
Their three-dimensional structures are available, except
for CBM45 and CBM53. Despite low sequence similar-
ity, the remaining seven families share a very similar
fold. The structural features of CBM20s will be
discussed in further detail below. The CBM20 family
currently has about 300 entries in the CAZy database,
representing high phylogenetic diversity, as CBM20s
are encountered in archaeans (5), bacteria (152), and
eukaryotes (123). Twenty unclassified entries, all
descending from industrial patents, are very likely of
bacterial origin. Bacterial and fungal CBM20s are
most frequently connected with catalytic domains in
GH13s, or the a-amylase family; GH14s and GH15s
contain b-amylases and GAs, respectively [27,28]. Sev-
eral specificities are found among the GH13s [29]:
a-amylase (EC 3.2.1.1) [30,31]; CGTase (EC 2.4.1.19)
[10]; maltotetraose-forming exo-amylase (EC 3.2.1.60)
[32]; and maltogenic a-amylase (EC 3.2.1.133) [11]. A
few CBM20s are from plant enzymes, including a-glu-
can, water dikinase 3 (GWD3) (EC 2.7.9.5) [18] and
the GH77 4-a-glucanotransferase (EC 2.4.1.25) [33].
Predicted glucan-binding residues
CBM20s are 90-130 amino acids long, and detailed
sequence analysis has revealed that there are no invari-
ant residues in thefamily [34–36]. Nevertheless, the
ability of CBM20s to bind to starch seems to be asso-
ciated with certain consensus residues. Originally, 11
conserved positions were indicated, on the basis of
eight aligned sequences from fungi and bacteria [37].
Analysis of a much larger set of CBM20s, however,
suggested that some of these residues are more impor-
tant than others, owing to their higher degree of
conservation across different taxa and functionalities
[28,34]. Four of the highly conserved residues are aro-
matic amino acids, which are typically directly
involved in glucan interactions.
Three-dimensional structures provide support for
two separate glucan-binding sites in CBM20s. Binding
site 1 contains two conserved tryptophans, Trp543 and
Trp590, in the canonical A. niger GA SBD [7].
Another tryptophan, conserved in 96% of the 103
analysed sequences, corresponds to Trp563 in the GA
SBD [34] and is assigned to binding site 2. Tyrosines,
assigned to binding site 2, aligning with Tyr527 and
Tyr556 in GA SBD, are conserved in 24 and 45 of the
103 analysed sequences, respectively. In addition to the
original 11 conserved residues, Phe519 in A. niger GA
SBD is found in 87% of the sequences and is replaced
by isoleucine, leucine or valine in several bacterial
b-amylases. Notably, Arabidopsis thaliana GWD3 has
an arginine at this position (Fig. 1) [34]. Finally, the
W615K mutant of the highly conserved Trp615 in GA
SBD was difficult to produce, suggesting that this resi-
due plays a structural role [7].
CBM20s are evolutionarily related to
CBM21s, CBM48s, and CBM53s
CBM20s were originally found at the C-termini of var-
ious starch hydrolases and CGTases, whereas CBM21s
were positioned N-terminally, as in GA from Rhizo-
pus oryzae [28,37]. Bioinformatics analysis suggested
that CBM20s and CBM21s constitute a CBM clan,
despite their low sequence identity [34]. Thus, CBM20s
and CBM21s were predicted to have similar secondary
and tertiary structures, and this was confirmed
by the solved structure of R. oryzae GA CBM21
[38], which has a conventional b-sandwich fold
and immunoglobulin-like architecture. Several plant
4-a-glucanotransferases and GWD3, as well as the
majority of CBM20-containing unknown eukaryotic
proteins [34], possess N-terminal CBM20s. CBM20s
are thus predominantly, but not exclusively
C-terminally positioned. In addition to CBM21s,
CBM48s and CBM53s are related to CBM20s [35,36],
and will be included in the evolutionary analysis.
SBDs occur with a variety of enzymatic activities
Alignment of 60 selected SBD sequences (Fig. 1) illus-
trates the close evolutionary relationship of the four
CBM families 20, 21, 48, and 53, although occasional
subtle structural differences make unambiguous classifi-
cation challenging. The corresponding evolutionary tree
(Fig. 2), containing 33 sequences from the largest
family, CBM20, eight from CBM21, 16 from CBM48,
and three from the new family, CBM53, highlights the
relationship. This set of sequences was created in an
effort to cover a wide spectrum of known CBM20s, i.e.
Carbohydrate-binding modulefamily20 C. Christiansen et al.
5008 FEBS Journal 276 (2009) 5006–5029 ª 2009 The Authors Journal compilation ª 2009 FEBS
Fig. 1. Amino acid sequence alignment of CBM families 20 (green), 21 (red), 48 (blue), and 53 (magenta). The abbreviations of the source proteins are given in Table 1. Eleven CBM20
conserved residues [37] are highlighted as follows: two tryptophans of binding site 1 in blue and substitutions by phenylalanine or tyrosine in turquoise; one tryptophan of binding site 2 in
red, substitutions by phenylalanine or tyrosine in magenta; the remaining eight residues are in yellow. The tyrosines or phenylalanines at CBM21 binding site 2 are in green. The additional
well-conserved phenylalanine in CBM20s and CBM48s is in black, andthe invariant CBM21 lysine is in grey. The alignment was performed by
CLUSTALW at the European Bioinformatic Insti-
tute’s server (http://www.ebi.ac.uk/) and then manually adjusted.
C. Christiansen et al. Carbohydrate-bindingmodulefamily 20
FEBS Journal 276 (2009) 5006–5029 ª 2009 The Authors Journal compilation ª 2009 FEBS 5009
from microbial amylolytic enzymes of GH13, GH14,
and GH15, the plant GWD3 [18], andthe mammalian
proteins laforin (EC 3.1.3.16/48) [39] and genethonin-1
[40]. CBM21s from GH13 and GH15 and regulatory su-
bunits of protein phosphatases [41] proposed to form
the common CBM20–CBM21 clan [34] were later exten-
ded to include CBM48s from the GH13 pullulanase (EC
3.2.1.41) subfamily [42], regulatory domains of mamma-
lian AMP-activated protein kinase (AMPK) [43] and
the plant starch excess 4 (SEX4, EC 3.1.3.48) protein
[36,44]. Moreover, three tandem SBD repeats from
Ar. thaliana starch synthase III (EC 2.4.1.21) [45,46],
recently placed into the new CBM53 family, were
included to broaden the evolutionary comparison of
SBDs belonging to these four CBM families (Table 1).
Individual SBD families display subtle
binding-site differences
Alignment of amino acid sequences of SBDs and SBD-
like motifs from CBM20s, CBM21s, CBM48s and
CBM53s (Fig. 1) reveals evolutionary relationships,
especially concerning positions occupied by aromatic
residues identified in binding sites 1 and 2 in CBM20s
[8]. It is evident that the 11 conserved residues in
CBM20s [37] are not strongly conserved in CBM21s or
CBM48s, and even some CBM20s lack certain con-
sensus residues (Fig. 1). One prominent example is
CBM20 of acarviose transferase from Actinoplanes sp.,
which has been shown to bind to a starch resin [47],
and has glycine replacing the conserved tryptophan in
binding site 2 (Fig. 1). This tryptophan, however, is
found in CBM21s from GH13 and GH15 amylases,
whereas it may be replaced by tyrosine or phenyl-
alanine in CBM20s and CBM21s of nonamylolytic
enzymes (Fig. 1). Most CBM21s contain the two try-
ptophans at CBM20 binding site 1, even though this
site in CBM21 from R. oryzae GH15 GA is located at
a different position in the sequence, owing to a slightly
different topology of this CBM family, where a strand
must be shifted for overlap with CBM20 topology [38].
With respect to CBM48s, one of the two tryptophans
in CBM20 binding site 1 (W543 in A. niger GA SBD)
is lacking in malto-oligosyltrehalose hydrolases (EC
3.2.1.141), pullulanases and isoamylases of GH13. By
contrast, CBM48s in GH13 branching enzymes and in
Fig. 2. Evolutionary tree of CBM families 20 (green), 21 (red), 48 (blue), and 53 (magenta). The abbreviations of the source proteins are
given in Table 1. The tree is based on the alignment of complete CBM sequences (shown in Fig. 1), including the gaps. It was calculated as
a
PHYLIP tree type using the neighbour-joining method (http://www.ebi.ac.uk/) and displayed by the program TREEVIEW [146].
Carbohydrate-binding modulefamily20 C. Christiansen et al.
5010 FEBS Journal 276 (2009) 5006–5029 ª 2009 The Authors Journal compilation ª 2009 FEBS
Table 1. SBD and ⁄ or SBD-like sequences from CBM families 20, 21, 48, and 53.
Family Abbreviation Specificity EC Source GenPept Length
20 AMY_Aspka a-Amylase 3.2.1.1 Aspergillus kawachii BAA22993 640
20 AMY_Bacsp a-Amylase 3.2.1.1 Bacillus sp. TS-23 AAA63900 613
20 AMY_Crcsp a-Amylase 3.2.1.1 Cryptococcus sp. S-2 BAA12010 631
20 AMY_Strgr a-Amylase 3.2.1.1 Streptomyces griseus CAA40798 566
20 MGA_Bacst Maltogenic a-amylase 3.2.1.133 Bacillu stearothermophilus AAA22233 719
20 M3H_Brasp Maltotriohydrolase 3.2.1.116 Brachybacterium sp. LB25 BAE94180 615
20 M4H_Psest Maltotetraohydrolase 3.2.1.60 Pseudomonas stutzeri AAA25707 548
20 M5H_Psesp Maltopentaohydrolase 3.2.1.– Pseudomonas sp. KO-8940 BAA01600 614
20 CGT_Bacci Cyclodextrin glucanotransferase 2.4.1.19 Bacillus circulans strain 251 CAA55023 713
20 CGT_Klepn Cyclodextrin glucanotransferase 2.4.1.19 Klebsiella pneumoniae AAA25059 655
20 CGT_Thbth Cyclodextrin glucanotransferase 2.4.1.19 Thermoanaerobacterium
thermosulfurogenes
AAB00845 710
20 CGT_Thcsp Cyclodextrin glucanotransferase 2.4.1.19 Thermococcus sp. B1001 BAA88217 739
20 CGT_Hafme Cyclodextrin glucanotransferase 2.4.1.19 Haloferax mediterranei CAI46245 713
20 ACT_Actsp Acarviose transferase 2.4.1.19 Actinoplanes sp. 50 ⁄ 110 AAE37556 724
20 BMY_Bacce b-Amylase 3.2.1.2 Bacillus cereus BAA34650 546
20 BMY_Bacme b-Amylase 3.2.1.2 Bacillus megaterium CAB61483 545
20 BMY_Thbth b-Amylase 3.2.1.2 Thermoanaerobacterium
thermosulfurogenes
AAA23204 515
20 GMY_Aspka Glucoamylase 3.2.1.3 Aspergillus kawachii BAA00331 639
20 GMY_Aspni Glucoamylase 3.2.1.3 Aspergillus niger AAB59296 640
20 GMY_Hypje Glucoamylase 3.2.1.3 Hypocrea jecorina 2VN7_A 599
20 GMY_Lened Glucoamylase 3.2.1.3 Lentinula edodes AAF75523 571
20 GMY_Neucr Glucoamylase 3.2.1.3 Neurospora crassa AAE15056 626
20 6AGT_Artgl 6-a-Glucosyltransferase 2.4.1.– Arthrobacter globiformis BAD34980 965
20 4AGT_Soltu 4-a-Glucanotransferase 2.4.1.25 Solanum tuberosum AAR99599 948
20 GWD3_Arath a-Glucan, water dikinase 2.7.9.45 Arabidopsis thaliana AY747068 1196
20 GEN_Homsa Genethonin-1 – Homo sapiens AAC78827 358
20 LAF_Homsa Laforin 3.1.3.48 ⁄ 16 Homo sapiens AAG18377 331
20 GPD_Homsa Glycerophosphodiester
phosphodiesterase
3.1.–.– Homo sapiens AAH27588 672
20 APU_Bacst Amylopullulanase 3.2.1.1 ⁄ 41 Bacillus stearothermophilus AAG44799 2018
20 APU_Thbth Amylopullulanase 3.2.1.1 ⁄ 41 Thermoanaerobacterium
thermosulfurogenes
AAB00841 1861
20 IGT_Bacci Isocyclomaltooligosaccharide
glucanotransferase
2.4.1.– Bacillus circulans BAF37283 995
20 CE1_Pyrfu Carbohydrate esterase 1 – Pyrococcus furiosus AAL81232 404
20 CE1_Thcko Carbohydrate esterase 1 – Thermococcus kodakarensis BAD84711 449
21 AMY_Lipko a-Amylase 3.2.1.1 Lipomyces kononenkoae AAC49622 624
21 AMY_Lipst a-Amylase 3.2.1.1 Lipomyces starkeyi AAN75021 647
21 GMY_Arxad Glucoamylase 3.2.1.3 Arxula adeninivorans CAA86997 624
21 GMY_Mucci Glucoamylase 3.2.1.3 Mucor circinelloides AAN85206 609
21 GMY_Rhior Glucoamylase 3.2.1.3 Rhizopus oryzae AAQ18643 604
21 PPRS_Cloac Protein phosphatase 1 regulatory subunit – Clostridium acetobutylicum AAK76874 247
21 PPRS_Homsa Protein phosphatase 1 regulatory subunit – Homo sapiens (human brain) AAH47502 299
21 PPRS_Sacce Protein phosphatase 1 regulatory subunit – Saccharomyces cerevisiae CAA86906 538
48 AMPK1_Ratno AMPK b1 subunit – Rattus norvegicus AAH62008 270
48 AKIN1_Zeama AKIN-b-c-1 protein – Zea mays AF276085 497
48 SEX4_Arath Starch excess 4 protein 3.1.3.48 Arabidopsis thaliana AAN28817 379
48 SNF1_Orysa SNF1-related regulatory subunit b1– Oryza sativa ABF95644 295
48 GSs_Grija Glycogen synthase subunit 2.4.1.21 Griffithsia japonica AAM93999 201
48 GBE_Escco Glycogen branching enzyme 2.4.1.18 Escherichia coli AAA23872 728
48 SBE_Horvu Starch branching enzyme 2.4.1.18 Hordeum vulgare AAP72268 775
48 GBE_Sacce Glycogen branching enzyme 2.4.1.18 Saccharomyces cerevisiae AAA34632 704
48 GBE_Homsa Glycogen branching enzyme 2.4.1.18
Homo sapiens AAA58642 702
48 MOTH_Brehe Malto-oligosyl trehalohydrolase 3.2.1.141 Brevibacterium helvolum AAB95369 589
C. Christiansen et al. Carbohydrate-bindingmodulefamily 20
FEBS Journal 276 (2009) 5006–5029 ª 2009 The Authors Journal compilation ª 2009 FEBS 5011
the regulatory proteins AMPK [43], AKIN [48], and
SEX4 [44], as well as CBM53 repeats from starch syn-
thase III, are likely to have a functional binding site 1
in which a tyrosine corresponds to Trp590 in A. niger
GA SBD (Fig. 1).
CBMs only partly cluster according to the
appended catalytic domains
Sequence features apparent in the CBM alignment are
reflected in the corresponding evolutionary tree (Fig. 2).
Remarkably, CBM20s from amylases cluster with the
starch-binding ⁄ glycogen-binding CBM20 from laforin.
In the tree, the CBM20s described as possible ‘interme-
diates’ [34], i.e., from GH13 amylopullulanases (EC
3.2.1.1/41) and carbohydrate esterases (EC 3.1.1 ), are
between those CBM48s that are most intimately related
to the CBM20s and CBM21s. The starch synthase III
CBM53 repeats are most closely related to CBM21s
and are positioned on its border with the other,
CBM48, group, which is more distant from CBM20s
[36]. Thus sequences classified in CBM48 do not appear
in a common cluster (Fig. 2), the CBM48s from the
GH13 pullulanase subfamily [42] clustering together on
a branch adjacent to CBM21s, which are mostly
encountered in GAs, whereas the remaining CBM48s
from AMPK, AKIN1, SEX4 and related regulatory
proteins group next to CBM20s from genethonin-1,
GWD3, 4-a-glucanotransferase, and phosphodiester-
ase 5 (Fig. 2). This relationship – first shown before
family CBM48 was defined [36] – justifies CBM48s
being placed in a clan with CBM20s and CBM21s.
CBM20 molecular structure
Conserved structural features of CBM20s
Three-dimensional structures have been reported for
seven of the nine SBD families defined so far: CBM20
[6,10]; CBM21 [38]; CBM25 and CBM26 [49]; CBM34
[50]; CBM41 [51]; and CBM48 [43]. No structures are
available for CBM45 and CBM53. All solved SBDs
show a b-sandwich fold with an immunoglobulin-like
topology. Ten CBM20 structures, including those of
both isolated CBMs and intact proteins possessing
CBM20s, have been determined using NMR or X-ray
crystallography (Table 2). The best characterized is
CBM20 from A. niger GA (GA SBD), which is used
here as the main representative of CBM20. Its struc-
ture was determined by NMR in both a free and a
b-cyclodextrin-complexed state, and shows a well-
defined b-sandwich fold with eight b-strands distrib-
uted in two b-sheets (Fig. 3) [6,8]. One has five
antiparallel b-strands, whereas the other is made from
one parallel b-strand and an antiparallel b-strand pair
[52]. This fold makes an open-sided distorted b-barrel
with six loops of significant length, four of which are
well defined. b-Strand 3 is absent in CGTases [53–55]
and in maltogenic a-amylase [11]. The approximate
dimensions of GA SBD are 42 · 38 · 31 A
˚
. One of
the GA SBD structures has b-cyclodextrin bound as a
starch mimic at both binding site 1 and binding site 2
[6], demonstrating the bivalent nature of this CBM20.
The N-terminus and C-terminus are located at oppo-
site ends of the longest axis of GA SBD (Fig. 3). A
disulfide bond (Cys509–Cys604) between the N-termi-
nal cysteine andthe loop connecting b-strands 7 and 8
contributes to structural stability, and mutations of
both cysteines to glycine or serine resulted in destabili-
zation, measured as a 10 °C reduction in unfolding
temperature (T
m
) and loss of 10 kJÆmol
)1
of free
energy, largely owing to an unfavourable change in
entropy [56].
The architecture and dynamics of the binding
sites
Binding site 1 consists of Trp543, Lys578, Trp590,
Glu591, and Asn595, andthe indole rings of Trp543
and Trp590 form the central part of a carbohydrate-
Table 1. Continued.
Family Abbreviation Specificity EC Source GenPept Length
48 MOTH_Sulso Malto-oligosyl trehalohydrolase 3.2.1.141 Sulfolobus solfataricus BAA11010 558
48 PUL_Klepn Pullulanase 3.2.1.41 Klebsiella pneumoniae AAA25124 1102
48 PUL_Horvu Pullulanase (limit dextrinase) 3.2.1.41 Hordeum vulgare AAD04189 904
48 ISO_Pseam Isoamylase 3.2.1.68 Pseudomonas amyloderamosa AAA25854 771
48 ISO_Sulso Isoamylase 3.2.1.68 Sulfolobus solfataricus AAK42273 718
48 ISO_Orysa Isoamylase 3.2.1.68 Oryza sativa BAA29041 733
53 SS3a_Arath Starch-synthase III – copy 1 2.4.1.21 Arabidopsis thaliana NP_172637 1025
53 SS3b_Arath Starch-synthase III – copy 2 2.4.1.21 Arabidopsis thaliana NP_172637 1025
53 SS3c_Arath Starch-synthase III – copy 3 2.4.1.21 Arabidopsis thaliana NP_172637 1025
Carbohydrate-binding modulefamily20 C. Christiansen et al.
5012 FEBS Journal 276 (2009) 5006–5029 ª 2009 The Authors Journal compilation ª 2009 FEBS
binding platform. This shallow and solvent-exposed
binding site is characterized by a small ligand contact
area, and undergoes very little structural change upon
binding [8]. By contrast, binding site 2, defined by
Thr526, Tyr527, Gly528, Glu529, Asn530, Asp554,
Tyr556, and Trp563, is more extended and undergoes
a large conformational rearrangement upon binding of
b-cyclodextrin, indicating that it has higher structural
plasticity than binding site 1 [8]. The main change is
observed in loop regions close to the C-terminal end
(Fig. 3). This part of binding site 2 is composed of a
flexible loop (Fig. 3), and in the GA SBD b-cyclodex-
trin complex, Tyr556 approaches Asp554 and Lys555,
inducing a substantial change in the position of
Asp560 and resulting in a more than 13 A
˚
movement
of the C
a
atom of this residue [8]. At binding site 2,
carbohydrate–protein contacts are dominated by van
der Waals stacking interactions, primarily provided by
Tyr527, Tyr556 and, to a lesser extent, Trp563. The
involvement of aromatic side chains in ligand binding
was confirmed by site-directed mutagenesis and UV
difference spectroscopy [7]. Structural plasticity at
binding site 2 seems to be a general property of
CBM20s, and significant conformational changes were
also observed in the loop of residues 460–465 in
b-amylase from B. cereus (Protein Data Bank code:
1B9Z), andthe loop of residues 627–636 in CGTase
from B. circulans strain 251 (Protein Data Bank code:
1CDG), when the proteins were crystallized in complex
with maltose [9,12].
Two maltose molecules were bound at the surface of
CBM20 of B. circulans strain 251 CGTase, and a third
was bound on the catalytic domain [9]. Binding site 1
includes Trp616 and Trp662, which stack onto both
glucose rings of the maltose. Direct hydrogen bonds
with Lys651 and Asn667 and water-mediated hydrogen
bonds with main chain carbonyl oxygens of Trp616 and
Glu663 further strengthen the maltose binding. In bind-
ing site 2, Tyr633 stacks on one of the glucose rings of
maltose, whereas the side chains of Thr598 and Asn627,
the main chain carbonyl oxygen of Ala599 andthe main
chain amide nitrogen of Gly601 form direct hydrogen
bonds with maltose. A water-mediated hydrogen bond
is observed between maltose andthe Asn603 side chain.
Binding site 2 is situated at the entry of the groove lead-
ing to the active site (Fig. 4), indicating that its function
may be a combination of starch binding and sequester-
ing single glucan chains into the active site. Indeed, the
side chain of Leu600 in the B. circulans strain 251
CGTase, which is a part of binding site 2, points into
the solvent and sterically confines the accessibility of this
site to single carbohydrate chains (Fig. 4B). In other
Table 2. A summary of CBM20 three-dimensional structures. WT, wild type.
Specificity Source
Protein Data
Bank code Ligand Form References
Cyclodextrin
glucanotransferase GH13
Bacillus circulans 8
a
1CGT Free WT [53]
1CGU b-Cyclodextrin WT
b
[139]
5CGT Maltotriose WT
b
[140]
Bacillus circulans strain 251
a
1CXI Free WT [54]
1CDG Maltose WT [9]
1CXH Maltoheptaose WT [54]
1CXE a-Cyclodextrin WT [54]
1CXK c-Cyclodextrin WT
b
[141]
1TCM Free W616A [10]
Bacillus sp. 1011
b
1PAM Free WT [55]
Bacillus stearothermophilus no. 2 1CYG Free WT Not available
Thermoanaerobacterium
thermosulfurigenes strain EM1
a
1CIU Free WT [142]
1A47 Maltohexaose inhibitor WT [143]
Maltogenic a-amylase GH13 Geobacillus stearothermophilus
strain C599
1QHP Maltose WT [11]
b-Amylase GH14 Bacillus cereus
a
1B90 Free WT [12]
1B9Z Maltose WT [12]
1CQY Free SBD (418–516) [114]
Glucoamylase GH15 Aspergillus niger 1KUL Free SBD (509–616) [6]
1ACO b-Cyclodextrin SBD (509–616) [8]
Hypocrea jecorina 2VN4 Free WT [62]
FLJ11085
c
Homo sapiens 2Z0B Free WT Not available
a
For these enzymes representative structure entries are listed from the many available.
b
Mutation was in the catalytic domain.
c
Putative
glycerolphosphodiester phosphodiesterase.
C. Christiansen et al. Carbohydrate-bindingmodulefamily 20
FEBS Journal 276 (2009) 5006–5029 ª 2009 The Authors Journal compilation ª 2009 FEBS 5013
CBM20s, phenylalanine, tyrosine or other bulky resi-
dues at this position may serve a similar purpose. This
architecture, where an aromatic or a bulky residue
defines a binding site accessible to single carbohydrate
chains, is also observed in the recently determined struc-
ture of GA CBM21 from R. oryzae [57], and is remi-
niscent of the surface binding site observed on the
C-terminal domain of barley a-amylase 1 [58,59]. Inter-
estingly, barley a-amylase has a second surface binding
site formed by two consecutive tryptophan residues
(Trp278 and Trp279 in the low-pI barley a-amylase;
Protein Data Bank code: 1HT6), which bears a close
resemblance to binding site 1 in CBM20. It remains to
be explored whether these architectural similarities
confer similar functionalities in starch binding in these
proteins.
A few more CBM20s form complex structures with
bound ligands (Table 2): B. circulans strain 8 CGTase
with b-cyclodextrin (Protein Data Bank code: 1CGU)
or maltotriose (Protein Data Bank code: 5CGT) and
G. stearothermophilus strain C599 maltogenic a-amy-
lase with maltose (Protein Data Bank code: 1QHP)
highlight the importance of van der Waals contacts
with the indole groups of the conserved Trp543 and
Trp590 in binding site 1 (A. niger GA numbering).
Moreover, other polar residues, such as Asn595 and
Lys598 (A. niger GA numbering) in binding site 1, are
likely to form direct or solvent-mediated hydrogen
bonds with larger ligands such as starch, andthe con-
served Lys598 packs on Trp543, thus contributing to a
more rigid conformation of the aromatic platform.
Binding site 2 is structurally more diverse, and shows
differences in sequence as well as with respect to the
residues involved in hydrogen bonding to ligands.
Notably, no bound carbohydrate was observed at
binding site 2 in the G. stearothermophilus strain C599
maltogenic a-amylase and in B. cereus b-amylase malt-
ose complexes [12,60]. A new conserved carbohydrate-
binding site was identified on the catalytic domain only
in bacterial b-amylases [61].
Linker regions and interaction with the catalytic
domain
SBDs are connected differently with various catalytic
domains. Most GAs have linker regions separating the
SBD from the catalytic domain, whereas, for example,
CGTases lack such linkers. The A. niger GA1 form,
including both the catalytic domain andthe polypep-
tide linker-connected SBD, was not crystallized. How-
ever, the recently solved structure of Hypocrea jecorina
GA [62] provides information on the spatial arrange-
ment of the catalytic module relative to the CBM20.
The SBD of H. jecorina GA is quite similar to that of
A. niger GA SBD determined by NMR [6], and the
structures show an rmsd of 1.7 A
˚
for 99 aligned C
a
atoms. In H. jecorina GA, binding site 1 is located on
the SBD on the side opposite to the variable loop
region andthe catalytic domain, whereas binding site 2
is near the catalytic domain. This juxtaposition of the
SBD relative to the catalytic domain is similar to the
architecture of CGTase from B. circulans strain 251 [9],
and seems also to be valid for several full-length
enzymes possessing CBM20s, e.g. maltogenic a-amylase
from G. stearothermophilus strain C599, and CGTases
from Bacillus sp. 1011 and Thermoanaerobacterium
thermosulfurigenes strain EM1 (Table 2), suggesting a
A
B
Fig. 3. A cartoon representation of the A. niger GA CBM20 (Pro-
tein Data Bank code: 1KUL) showing binding sites 1 (A) and 2 (B).
The cartoon is coated by a transparent molecular surface represen-
tation to give a topological perspective of the binding sites. The
N-terminus andthe C-terminus are in yellow and blue, respectively.
The flexible loop showing the largest conformational change upon
ligand binding is in red. Ligand-binding aromatic residues and other
selected residues implicated in ligand interactions are shown as
sticks. The view in (B) is rotated about 180° along the long axis of
the molecule.
Carbohydrate-binding modulefamily20 C. Christiansen et al.
5014 FEBS Journal 276 (2009) 5006–5029 ª 2009 The Authors Journal compilation ª 2009 FEBS
conserved architecture for a phylogenetically diverse
group of enzymes.
The extent of interaction between the catalytic mod-
ules in GAs with CBM20s, joined to them by polypep-
tide linkers, remains an open question. The H. jecorina
GA structure reveals an intimate interaction between
the SBD andthe catalytic domain, andthe linker
region has a well-defined electron density and extends
as a random coil interacting with the catalytic module
through side chain contacts [62]. The rather compact
conformation andthe orientation of the SBD relative
to the catalytic domain suggests that the SBD is impor-
tant in directing the enzyme to regions where the starch
granular structure is disrupted. By contrast, the low-
resolution structure of the intact GA1 from A. niger in
solution, recently determined with the aid of small-
angle X-ray scattering, reveals an extended conforma-
tion where the highly O-glycosylated polypeptide linker
separates the two domains of the enzyme [63]. Interest-
ingly, the linker of the A. niger GA is 22 amino acids
longer than that of H. jecorina. These two examples
indicate that the modules of fungal GAs vary in struc-
tural organization and in flexibility of the domains rela-
tive to each other, promoting a fine-tuned mode of
action towards the natural substrates.
Ligand binding and role of CBM20s in
enzyme catalysis
Binding site topology classification
On the basis of the topology of ligand-binding sites,
CBMs are classified into three distinct types, the
A-type, B-type, and C-type. The A-type has a planar
hydrophobic binding surface that recognizes highly
crystalline polysaccharides such as cellulose and chitin
[64]. The B-type, in contrast, has a binding cleft or a
groove with at least two subsites accommodating a sin-
gle polysaccharide chain [65–67]. CBM20s belong to
this type, together with the majority of identified
CBMs. Typically, the binding affinity of B-type CBMs
strongly depends on ligand size. Thus, increased affin-
ity up to maltononaose was demonstrated for GA
SBD, whereas its interaction was negligible for oligo-
saccharides of degree of polymerization (DP) < 4
[64,68]. B-type CBMs do not bind to planar surfaces,
similar to those found in highly crystalline polysaccha-
rides such as cellulose. Moreover, in B-type as opposed
to A-type binding, direct hydrogen bonds play a key
role in defining affinity and ligand specificity [67,69].
C-type CBMs recognize termini of polysaccharides in a
solvent-exposed binding pocket or a blind canyon, and
have a preference for small sugars, optimally binding
monosaccharides, disaccharides, and trisaccharides.
The B-type binding thermodynamics in CBM20
According to the binding site topology classification of
CBMs [64], ligand binding to B-type modules is
accompanied by an unfavourable change of entropy,
which is compensated for by a large, favourable
enthalpic contribution that dominates the binding free
energy change. This pattern is indeed corroborated by
the energetics of b-cyclodextrin binding to A. niger GA
SBD, giving DG and DH of )26.7 kJÆmol
)1
and
)58 kJÆmol
)1
, respectively, at pH 7.0 and 25 °C [56], in
AB
Fig. 4. (A) Surface represention of CGTase from B. circulans strain 251 in complex with maltose (Protein Data Bank code: 1CDG). This struc-
ture illustrates the close proximity of CBM20 binding site 2, represented by Tyr633 (yellow) and Leu600 (red), to the active site cleft and the
catalytic nucleophile Glu257 (green). Three bound maltose molecules are shown as sticks at binding sites 1 and 2, and at a third site on the
catalytic domain of CGTases (upper part of the molecule). (B) Close-up revealing architectural features of CBM20 binding site 2, with Leu600
protruding into the solvent and restricting the access to this site to single a-glucan chains. The structure was rendered using
PYMOL v0.99
software (DeLano Scientific LLC, Palo Alto, CA).
C. Christiansen et al. Carbohydrate-bindingmodulefamily 20
FEBS Journal 276 (2009) 5006–5029 ª 2009 The Authors Journal compilation ª 2009 FEBS 5015
[...]... that the SBD is an integral part of the CGTase structure,and that the intimate interactions and native spatial alignment relative to the catalytic module are crucial for the stability and catalytic performance, as opposed to GA SBD [107] and a-amylase SBD [102] This is further corroborated by the structure of G stearothermophilus CGTase, where SBD binding site 2 is close to the active site on the catalytic... rendering their functionand structural stability much more dependent on the integral architecture This is evident from the loss of activity upon the swapping of CBM20 of B macerans CGTase with that of A awamori GA Another intriguing point is the role of the two binding sites present in most CBM20s The different afnities of these binding sites and their spatial arrangement with respect to the catalytic... compared with the wild type The role of the CGTase SBD was further investigated by swapping it with the homologous (60% similarity) A awamori GA SBD [106] FEBS Journal 276 (200 9) 50065029 ê 200 9 The Authors Journal compilation ê 200 9 FEBS 5019 Carbohydrate-bindingmodulefamily20 C Christiansen et al Although both domains were shown to retain their binding activities as independent domains, the replacement... instrumental for a better understanding of their physiological roles CBM20s and bioengineering CBM20-containing fusion proteins Exploiting the CBM20 binding functionality in protein engineering As CBM20s generally maintain their structural fold and afnity for starch, they have been explored as afnity purication tags in protein fusions CBM20s of Bacillus macerans CGTase and A awamori GA thus retain strong... other CBM20s discussed above Hence, cooperativity between SBD binding sites andthe catalytic domain is conceivable, and has been veried by mutational analysis of the two sites in CGTase from B circulans strain 251, resulting in the Hill coefcient being reduced from 1.78 for the wild type to 1.3 and 1.05 for the enzyme with W616A and Y633A mutations at binding sites 1 and 2, respectively [10] CBM20... afnity of binding to insoluble starch can also be described using linear adsorption isotherms FEBS Journal 276 (200 9) 50065029 ê 200 9 The Authors Journal compilation ê 200 9 FEBS C Christiansen et al Carbohydrate-bindingmodulefamily20 Table 3 Binding of CBM20s to soluble oligosaccharides CBM20 (expression host) Ligand and method GA SBD (A niger) (A niger) (A niger) Maltosea Maltoheptaosea Maltododecaosea... Lys576 and Trp588 with either leucine or isoleucine signicantly reduced the raw starch-binding capacity [31] Thus, adsorption levels for Carbohydrate-bindingmodulefamily20 single mutants were $ 60%, as compared with $ 74% for the C-terminally truncated protein, which resembles the wild type in this case Double and triple mutations resulted in modest further reductions in adsorption to $ 205 0%, with the. .. in the bioengineering of starches The design of hydrolases for noncook raw starch processes for bioethanol industries is an urgent and important task, where CBM20s and other SBD types have the potential to increase hydrolytic efciency and rates Finally, the continuous updating of databases with new sequences has enabled a more robust analysis of the evolutionary relationships within CBM20s and in the. .. context of the related families CBM21, CBM48, and CBM53 The increase in the number of the sequences, however, both reveals the challenges of unambiguous family assignment and represents a source for future discoveries of new functionalities and applications Acknowledgements This work was supported by the Danish Natural Science Research Council, the Danish Research Council for Technology and Production... Abou Hachem), theCarbohydrate-bindingmodulefamily20 Carlsberg Foundation, and a FOBI PhD scholarship (C Christiansen) S Janecek thanks the Slovak grant agency VEGA for grant No 2 0114 08 andthe Ministry of Education of the Slovak Republic for project AV-4 202 3 08 References 1 Martin C & Smith AM (1995) Starch biosynthesis Plant Cell 7, 971985 2 Tester RF, Karkalas J & Qi X (200 4) Starch . ARTICLE The carbohydrate-binding module family 20 – diversity, structure, and function Camilla Christiansen 1,2,3 , Maher Abou Hachem 2 ,S ˇ tefan Janec ˇ ek 4,5 , Anders Viksø-Nielsen 3 , Andreas. module family 20 FEBS Journal 276 (200 9) 500 6–5 029 ª 200 9 The Authors Journal compilation ª 200 9 FEBS 5009 from microbial amylolytic enzymes of GH13, GH14, and GH15, the plant GWD3 [18], and the. Christiansen et al. Carbohydrate-binding module family 20 FEBS Journal 276 (200 9) 500 6–5 029 ª 200 9 The Authors Journal compilation ª 200 9 FEBS 5011 the regulatory proteins AMPK [43], AKIN [48], and SEX4