Báo cáo khoa học: The C-type lectin-like domain superfamily ppt

39 515 0
Báo cáo khoa học: The C-type lectin-like domain superfamily ppt

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

REVIEW ARTICLE The C-type lectin-like domain superfamily Alex N. Zelensky and Jill E. Gready Computational Proteomics and Therapy Design Group, John Curtin School of Medical Research, Australian National University, Canberra, Australia, Subdivision: Proteomics Introduction The superfamily of proteins containing C-type lectin- like domains (CTLDs) is a large group of extracellular Metazoan proteins with diverse functions. It has been the subject of some general literature reviews [1,2], but with many more focusing on its particular functions (e.g. [3,4]). There are also several systematic studies [5–9]. A classification of the family members based on the overall domain architecture of the CTLD-containing proteins (CTLDcps), which was introduced by Drick- amer in 1993 [2] and updated recently [6], served as a useful framework for the superfamily studies. However, despite a voluminous literature describing some of the family’s properties in great detail, we feel that a fresh critical review would be useful, as the previous review of this scale was published more than a decade ago [2]. Our approach has several main goals, outlined below. Keywords C-type lectin-like domain; domain superfamily; protein evolution; carbohydrate binding Correspondence J. E. Gready, Computational Proteomics and Therapy Design Group, Division of Molecular Bioscience, John Curtin School of Medical Research, PO Box 334, Canberra ACT 2601, Australia Fax: (+)61 2 6125 0415 Tel.: (+)61 2 6125 8303 Website: http://jcsmr.anu.edu.au/dbmb/ gready/gready.htm (Received 31 July 2005, revised 17 October 2005, accepted 24 October 2005) doi:10.1111/j.1742-4658.2005.05031.x The superfamily of proteins containing C-type lectin-like domains (CTLDs) is a large group of extracellular Metazoan proteins with diverse functions. The CTLD structure has a characteristic double-loop (‘loop-in-a-loop’) stabilized by two highly conserved disulfide bridges located at the bases of the loops, as well as a set of conserved hydrophobic and polar interactions. The second loop, called the long loop region, is structurally and evolutio- narily flexible, and is involved in Ca 2+ -dependent carbohydrate binding and interaction with other ligands. This loop is completely absent in a subset of CTLDs, which we refer to as compact CTLDs; these include the Link ⁄ PTR domain and bacterial CTLDs. CTLD-containing proteins (CTLDcps) were originally classified into seven groups based on their over- all domain structure. Analyses of the superfamily representation in several completely sequenced genomes have added 10 new groups to the classifica- tion, and shown that it is applicable only to vertebrate CTLDcps; despite the abundance of CTLDcps in the invertebrate genomes studied, the domain architectures of these proteins do not match those of the vertebrate groups. Ca 2+ -dependent carbohydrate binding is the most common CTLD function in vertebrates, and apparently the ancestral one, as suggested by the many humoral defense CTLDcps characterized in insects and other invertebrates. However, many CTLDs have evolved to specifically recognize protein, lipid and inorganic ligands, including the vertebrate clade-specific snake venoms, and fish antifreeze and bird egg-shell proteins. Recent studies highlight the functional versatility of this protein superfamily and the CTLD scaffold, and suggest further interesting discov- eries have yet to be made. Abbreviations CRD, carbohydrate recognition domain; CTLD, C-type lectin-like domain; CTLDcp, CTLD-containing protein; DC-SIGN, Dendritic cell-specific ICAM-grabbing nonintegrin; EST, expressed sequence tag; MBP, mannose-binding protein; NK, natural killer cell; PSP, pulmonary surfactant protein; PTR, protein tandem repeat. FEBS Journal 272 (2005) 6179–6217 ª 2005 The Authors Journal compilation ª 2005 FEBS 6179 The literature is strongly biased towards several groups of mammalian proteins, many of more biomed- ical interest. In this review we tried to capture the superfamily in all its variety, rather than attempting to provide a description of the known members propor- tional to the amount of published data. In particular, we wanted to integrate the results of the systematic studies of the CTLDs from lower vertebrates, such as proteins from snake venom and fish CTLDs, etc. with the classification of mammalian CTLDs. The recent inclusion of new CTLDcp groups inspired a critical reassessment of the principles on which the current domain-based classification was built. We also wanted to summarize the functional data on invertebrate CTLDs, which to our knowledge has never been reviewed previously at a general level. In addition, numerous structural studies of CTLDs in the last decade have provided much information on the inner workings of the fold and the mechanisms of Ca 2+ -dependent carbohydrate binding. We have attempted to generalize these data and outline the most common elements of the domain. An important correlation between the residue composition of the pri- mary carbohydrate-binding site and its basic specificity towards mannose- or galactose-group monosaccharides was discovered early in the history of CTLD studies and remains the most useful means for CTLD-function prediction. However, several models suggested to explain the mechanisms of such a correlation had to be rejected as the volume of data grew, and no com- prehensive explanation of this fundamental phenom- enon has been published. Our goal was to analyze the current state of the literature on this problem, to see if an explanation is apparent. Finally, we wanted to address the inconsistencies of the terminology of the CTLDcp superfamily which exist in the literature, and to suggest clear definitions for the relevant terms. The CTLD superfamily A brief history of discovery C-type lectins were among the first animal lectins dis- covered. Bovine conglutinin, which belongs to the col- lectin group of C-type lectins, has been known since 1906, and agglutinating activity of the snake venom lectins was first described much earlier, in 1860 [10]. In 1988 Drickamer suggested to organize animal lectins into several categories, and classified Ca 2+ -dependent lectins structurally similar to the asialoglycoprotein receptor as the C-type lectin group [11]. Since then, the known family has grown significantly, and now includes more than a thousand identified members (including those from genome sequences only) from different animal species, most of which lack lectin activity. Term definitions: CTLD, CRD, C-type lectin The terms ‘C-type lectin’, ‘carbohydrate recognition domain’ (CRD), ‘C-type lectin domain’ (CTLD), ‘C-type lectin-like domain’ (also abbreviated as CTLD), are often used interchangeably in the litera- ture. This may be a source of confusion. The history of the introduction and the common meanings of the terms are outlined below, followed by the definitions we will use in this review. The term ‘C-type lectin’ was introduced to distin- guish a group of Ca 2+ -dependent (C-type) carbohy- drate-binding (lectin) animal proteins from the other (Ca 2+ -independent) types of animal lectins. When the structures of C-type lectins were established biochemi- cally and functions of different domains were defined, it was found that carbohydrate-binding activity was mediated by a compact module – the ‘carbohydrate- recognition domain’ (CRD) – which was present in all Ca 2+ -dependent lectins but not in other types of ani- mal lectins [11–13]. Comparison of CRD sequences from different C-type lectins revealed conserved resi- due motifs characteristic of the domain [2,11,13], which allowed discovery of many more proteins that contained it. At the same time, crystallographic studies confirmed that the CRD of the C-type lectins has a compact globular structure, which was not similar to any known protein fold [14]. This domain has been called ‘C-type CRD’ or ‘C-type lectin domain’. As the number of determined sequences grew, it became clear that not all proteins containing C-type CRDs can actu- ally bind carbohydrates or even Ca 2+ . To resolve the contradiction, a more general term ‘C-type lectin-like domains’ was introduced to refer to such domains [1,3]. The usage of this term is however, somewhat ambiguous, as it is used both as a general name for the group of domains with sequence similarity to C-type lectin CRDs (regardless of the carbohydrate- binding properties), and as a name of the subset of such domains that do not bind carbohydrates, with the subset that does bind carbohydrates being called C-type CRDs [6,8]. Also both ‘C-type CRD’ and ‘C-type lectin domain’ terms are still being used in relation to the C-type lectin homologues that do not bind carbohydrate (e.g. [15–17]), and the group of pro- teins containing the domain is still often called the ‘C-type lectin family’ or ‘C-type lectins’, although most of them are not in fact lectins. The abbreviation CRD The C-type lectin-like domain superfamily A. N. Zelensky and J. E. Gready 6180 FEBS Journal 272 (2005) 6179–6217 ª 2005 The Authors Journal compilation ª 2005 FEBS is used both in the meaning of ‘C-type carbohydrate- recognition domain’ and in a more general meaning of ‘carbohydrate-recognition domain’, which encompasses domains from different lectin groups [8]. Occasionally CRD is also used to designate the short amino-acid motifs (i.e. amino-acid domain) within CTLDs that directly interact with Ca 2+ and carbohydrate (e.g. [18]). Structure comparisons add another meaning to the definition of the C-type lectin domain, as structural similarities have been discovered between C-type lectin CRDs and protein domains that did not show signifi- cant sequence similarity to any of the known C-type lectins but adopted a similar fold [19–23]. As the fold is very unusual, these domains have been separated into a common group in structure classification data- bases. For example, in the SCOP database [24] C-type lectins and structurally related domains are grouped at the fold level (‘C-type lectin-like fold’), which is the second level from the top of the classification hier- archy. However, although the structural similarity is often acknowledged in the literature, the common meaning of the C-type lectin-like domain does not include these domains [1,6]. Here we will use the term ‘C-type lectin-like domain’ (CTLD) in its broadest definition to refer to protein domains that are homologous to the CRDs of the C-type lectins, or which have structure resembling the structure of the prototypic C-type lectin CRD. Pro- teins harboring this domain will be called CTLD- containing proteins (CTLDcps) instead of the more common ‘C-type lectins’, as the latter implies carbohy- drate-binding ability which most of the CTLDcps are not known to possess. Phylogenetic distribution, groups With a few exceptions, which will be discussed below, CTLDs are only found, extracellularly, in Metazoa. The domain has been a very popular framework evolutionarily for generating new func- tions and is found in various structural and func- tional contexts. CTLDcps are ubiquitous in multicellular animals, and are found in a broad range of species, from sponges to human [6,25]. CTLDcp-encoding genes have been found in all fully sequenced Metazoan genomes, and, in general, in large numbers. For example, the CTLD is the 7th most abundant domain family in Caenorhabditis ele- gans [26]. The family shows both evolutionary flexi- bility and conservation. Whole-genome studies have shown that although there are virtually no similarit- ies between CTLDcps from worm, fruit fly and vertebrates [8], relatively few modifications occurred within the vertebrate lineage during evolution from fish to mammals [9], with some members showing sequence conservation approaching the conservation of histones. Non Metazoan CTLDs There are several interesting examples of non Metazoan CTLDcps, which can be divided into two groups. Members of the first group come from parasitic bac- teria and viruses; these are involved in interactions with the animal host and are either hijacked host pro- teins or their imitations. This group includes bacterial toxins (pertussis toxin [23] and proaerolysin [22]) and outer membrane adhesion proteins (intimin from enteropathogenic Escherichia coli [21] and invasin from Yersinia pseudotuberculosis [27]) and viral proteins. Viral CTLDcps are either transmembrane proteins or structural envelope proteins, and include, for example, eight ORFs in the fowlpox virus genome [28], proteins from vaccinia virus [29,30], African swine fever virus [31], cowpox virus [32], avian adenovirus gal1 [33], myxoma virus [34], molluscum contagiosum [35], Epstein-Barr virus [36], and alcelaphine herpesvirus [37]. Unlike bacterial CTLDs, which were assigned to the CTLD superfamily on the basis of structural simi- larity only, viral proteins contain a canonical CTLD with significant similarity to those in mammalian CTLDcps. While the presence of CTLDcps in parasites has an obvious rationalization, the origins of another group of non Metazoan CTLDcps is unclear. We have found three proteins that can be assigned to this group: two proteins from plants, and a putative protein encoded by an ORF from a marine planctomycete Pirellula sp. (GenBank ID:32443381). The latter sequence, which is 7716 amino acids long and is encoded by the biggest ORF in the genome of that bacterium [38], contains several C-type lectin-like, laminin G and cadherin domains, all of which are domains almost exclusively found in Metazoa. The most parsimonious explanation of the presence of all these domains in the Pirellula genome is horizontal gene transfer, but what the func- tion of the protein harboring them might be is a mystery, as Pirellula are free-living species. The plant CTLDcp sequences originate from the Arabidopsis thaliana genome annotation (transcript IDs At4g22160 and At1g52310) and are not characterized functionally. At1g52310 is a transmembrane protein with a typical CTLD in the extracellular domain and a protein kinase domain in the cytoplasmic part; it has a well- conserved orthologue in the rice genome sequence. A. N. Zelensky and J. E. Gready The C-type lectin-like domain superfamily FEBS Journal 272 (2005) 6179–6217 ª 2005 The Authors Journal compilation ª 2005 FEBS 6181 It is not absolutely clear whether the CTLD super- family is monophyletic, as homology between the canonical and some of the compact CTLDs (see below) cannot be confidently established. There seems little doubt that the Link domain group of CTLDs has emerged as a result of a deletion of the long loop region from an ancestral canonical CTLD, because the Link domains have a much narrower phylogenetic dis- tribution (only found in vertebrates), are less diverse, and show detectable sequence similarity to the canon- ical CTLDs [19]. However, the evolutionary relation- ship of the compact CTLDs from the bacterial toxins to the animal CTLDs is uncertain [39]. These domains could either have been acquired by horizontal transfer or could have arisen by convergent evolution, as mim- icry of host proteins. The CTLD fold The CTLD fold has a double-loop structure (Fig. 1). The overall domain is a loop, with its N- and C-ter- minal b strands (b1, b5) coming close together to form an antiparallel b-sheet. The second loop, which is called the long loop region, lies within the domain; it enters and exits the core domain at the same location. Four cysteines (C1-C4), which are the most conserved CTLD residues, form disulfide bridges at the bases of the loops: C1 and C4 link b5 and a1 (the whole domain loop) and C2 and C3 link b3 and b5 (the long loop region). The rest of the chain forms two flanking a helices (a1 and a2) and the second (‘top’) b-sheet, formed by strands b2, b3 and b4. The long loop region is involved in Ca 2+ -dependent carbohydrate binding, and in domain-swapping dimerization of some CTLDs (Fig. 2), which occurs via a unique mechanism [40–44]. The conserved positions involved in CTLD fold maintenance and their structural roles have been dis- cussed in detail elsewhere [5]. In addition to the four conserved cysteines, one other sequence feature needs to be mentioned here, the highly conserved ‘WIGL’ motif. It is located on the b2 strand, is highly conserved and serves as a useful landmark for sequence analysis. Variations of the fold: canonical, compact, long, short Structurally, CTLDs can be divided into two groups: canonical CTLDs having a long loop region, and com- pact CTLDs that lack it (Fig. 2). The second group includes Link or protein tandem repeat (PTR) domains α1 α1 α2 α2 β5 β5 β1 β1 ' β3 β3 β2 β2 β4 β4 β1 β1 Fig. 1. CTLD structure. A cartoon representation of a typical CTLD structure (1k9i). The long loop region is shown in blue. Cystine brid- ges are shown as orange sticks. The cystine bridge specific for long form CTLDs (C0-C0¢) is also shown. A BC Fig. 2. Variation of the long loop region structure. Three common forms of the CTLD long loop region are shown. Panels (A) and (C) show canonical CTLDs in which the long loop region is tightly packed (A) or flipped out to form a domain-swapping dimer (C). A compact CTLD from human CD44 Link domain is shown in panel (B). The core domain and long loop region are colored green and blue, respectively. The C-type lectin-like domain superfamily A. N. Zelensky and J. E. Gready 6182 FEBS Journal 272 (2005) 6179–6217 ª 2005 The Authors Journal compilation ª 2005 FEBS [19,20] and bacterial CTLDs [27,39,45]. Another family usually included in the CTLD superfamily is that of endostatin [1,24,46]. However, in the comparative structure analysis [5], we did not find substantial simi- larity between the CTLD and endostatin folds, apart from the general topology. As sequence similarity between endostatin and CTLDs is also absent, we not consider the endostatin fold as an example of a CTLD and do not consider it further. Another subdivision of CTLDs is based on the pres- ence of a short N-terminal extension, which forms a b-hairpin at the base of the domain (Fig. 1). The CTLDs containing such an extension are called ‘long form’. The hairpin is stabilized by an additional cys- tine bridge, and the presence of these two additional cysteines at the beginning of the CTLD sequence is used to distinguish between long and short form CTLDs in sequence analysis. No systematic study of the N-terminal extension, or of its possible roles, has been published. Secondary structure element numbering Although the CTLD fold is very well conserved among its known representatives, there is no general agree- ment on the numbering of CTLD secondary structure elements in the literature. The secondary structure element numbering scheme in the first solved CTLD structure (rat MBP-A [14]) included five strands, two helices and four loops. However, this description turned out to be insufficient, as MBP lacks some sec- ondary structure elements that are present in long- form CTLD structures, while other small strands were not defined. Other reports describing the structures of CTLDs that have a different number of secondary structure elements than MBP-A either introduced their own numbering (b strands 1–6 in asialoglycoprotein receptor (ASGPR [47]); six b strands in Link module, with labeling not consistent with ASGPR or MBP-A [20]; b1- b7 in NKG2D [48]; b1-b8 in EMBP [49]), or extended the secondary structure element naming scheme used for MBP-A (Ly49A secondary structure element numbering is consistent with that in MBP-A [50]). For consistency we will use a universal number- ing scheme ([5], Fig. 3), taking the same approach as was used in the Ly-49 A structure; this allows both direct reference to the most studied CTLD structures (MBP-A and -C) and assigns individual numbers to the elements that are present throughout the family. Other elements will be given derived names and num- bers: the b strand specific for the long-form CTLD is labeled b0, the short b strand between a1 and a2is labeled b1¢, and the two b strands forming a hairpin C-terminal to b2 are labeled b2¢ and b2¢¢. Ca-binding sites Four Ca 2+ -binding sites are found in CTLDs Four Ca 2+ -binding sites in the CTLD domain recur in CTLD structures from different groups (Fig. 4). The site occupancy depends on the particular CTLD sequence and on the crystallization conditions [14,51]; in different known structures zero, one, two or three Fig. 3. CTLD secondary structure element numbering. Ribbon diagrams for a compact (intimin, 1f00) and a canonical (E-selectin, 1g1t) CTLD structure. The long loop region in E-selectin, and the short a helix, which replaces the long loop region in compact CTLDs, are shown in black. Secondary structure elements are numbered according to the universal numbering scheme [5]. A. N. Zelensky and J. E. Gready The C-type lectin-like domain superfamily FEBS Journal 272 (2005) 6179–6217 ª 2005 The Authors Journal compilation ª 2005 FEBS 6183 sites are occupied. Sites 1, 2 and 3 are located in the upper lobe of the structure, while site 4 is involved in salt bridge formation between a2 and the b1 ⁄ b5 sheet. Sites 1 and 2 were observed in the structure of rat MBP-A complexed with holmium, which was the first CTLD structure determined [14]. Site 3 was first observed in the MBP-A complex with Ca 2+ and oligo- mannose asparaginyl-oligosaccharide [51]. It is located very close to site 1 and all the side chains coordinating Ca 2+ in site 3 are involved in site 1 formation. As bio- chemical data indicate that MBP-A binds only two cal- cium atoms [52], Ca 2+ -binding site 3 is considered a crystallographic artifact [51]. However, in many CTLD structures where site 1 is occupied, a metal ion is also found in site 3; examples include the structures of DC-SIGN and DC-SIGNR [53], invertebrate C-type lectin CEL-I [54], lung surfactant protein D [55] and the CTLD of rat aggrecan [56]. It is interesting to note that molecular dynamics simulations of the MBP-A ⁄ mannose complex suggested that Ca 2+ -3 is involved in the binding interaction [57]. Ca-binding site 2 is involved in carbohydrate binding Residues with carbonyl sidechains involved in Ca 2+ coordination in site 2 form two characteristic motifs in the CTLD sequence, and together with the calcium atom itself are directly involved in monosaccharide binding. The first group of residues, the ‘EPN motif’ in MBP-A (E185, P186, N187), is contributed by the long loop region and contains two residues with carbonyl sidechains separated by a proline in cis conformation. The carbonyl side chains provide two Ca-coordination bonds, form hydrogen bonds with the monosaccharide and determine binding specificity. The cis-proline is highly conserved and maintains the backbone confor- mation that brings the adjacent carbonyl side chains into the positions required for Ca 2+ coordination. The second group of residues, the ‘WND motif’ (positions 204–206), is contributed by the b4 strand. Although only asparagine and aspartate are involved in Ca-coordination, tryptophan immediately preceding them is a highly conserved contributor to the hydropho- bic core (position b4W [5]) and is a useful landmark for detecting the motif in a sequence. In the MBP-A struc- ture, Asn205 and Asp206 provide three Ca-coordination bonds (two from the side chains, one from the backbone carbonyl of Asp) and also form hydrogen bonds with the sugar. One more carbonyl side chain is involved in site 2 formation. It belongs to the residue preceding the second conserved cysteine at the end of the long loop region (Glu193 in MBP-A), and forms one coordination bond with the Ca 2+ ion. As no other Ca-binding site except for site 2 is known to be involved in sugar binding, and as the site Fig. 4. Ca-binding sites in CTLDs. Shown are ribbon diagrams of two representative CTLD structures, rat MBP-A and human ASGPR-I, dem- onstrating the four typical locations of calcium ions in the CTLD. Ca 2+ ions are shown as black spheres, and numbers referenced to the different sites in the text are indicated next to the arrows. The C-type lectin-like domain superfamily A. N. Zelensky and J. E. Gready 6184 FEBS Journal 272 (2005) 6179–6217 ª 2005 The Authors Journal compilation ª 2005 FEBS 2 residue motifs can be confidently detected in the sequence, it is common in the literature to associate the predicted Ca 2+ ⁄ carbohydrate binding properties of an uncharacterized sequence with the presence of these motifs (e.g. [7,8]). Although this is a useful simplifica- tion, it should be noted that the absence of the motifs associated with Ca 2+ -binding site 2 does not indicate that the CTLD is incapable of binding Ca 2+ , as there are two independent sites (1 and 4). Also, the presence of these motifs does not guarantee lectin activity for the CTLD, as there are numerous examples of CTLDs that contain the conserved motifs but are not known to bind monosaccharides (see below). Sites 1, 2 and 4 play structural roles Despite their spatial proximity, from the evolutionary and structural points of view Ca 2+ -binding sites 1 and 2 should be considered as independent. Crystallograph- ic studies of rat MBP-A CTLD crystallized at a low metal ion concentration (0.325 mm Ho 3+ instead of 20 mm as used to obtain the CTLD complexed with mannose) have shown that site 1 has higher affinity for Ca 2+ as it remains occupied and Ca 2+ -coordination geometry is retained while site 2 loses its metal ion [58]. On the other hand, in the 4th CTLD of the human macrophage mannose receptor, Ca 2+ -binding site 1 is less stable than site 2 [41,59]. This is also the case for the rat pulmonary surfactant protein A (SP-A), where only some of the required ligands for Ca 2+ -1 are present and these can provide only three coordination bonds to the Ca 2+ . In one of the two solved SP-A structures (PDB 1r14) both site 1 and site 2 are occupied by metal atoms, while in the other (PDB 1r13) only site 2 is occupied [60]. SP-A is a par- ticularly good example supporting the mutual inde- pendence of sites 1 and 2 because in its close homologue – pulmonary surfactant protein D – sites 1, 2 and 3 are occupied by Ca 2+ (PDB 1pw9, 1pwb [61]). Independence of Ca 2+ -binding site 1 is also supported by the fact that in several CTLD structures site 1 is missing, while site 2 contains a calcium ion and is involved in carbohydrate binding. Examples of such structures are human E- and P-selectins (PDB 1esl, 1g1t, 1g1q, 1g1r, 1g1s) [62,63] and tunicate lectin TC14 (PDB 1byf, 1tlg) [64]. Ca 2+ -binding site 4 was first observed in the struc- ture of the factor IX ⁄ X-binding protein from the venom of Trimeresurus flavoviridis, where it was the only location of Ca 2+ ions [40]. It is occupied by Ca 2+ in several other snake venom CTLD structures. Two observations suggest that this site is a property of the CTLD in general rather than restricted to the snake venom group of CTLDs. First, it is present in the human asialoglycoprotein receptor I [47], which is a very remote homologue of the snake venom CTLDs. Second, as shown by comparative analysis of CTLD structures [5], Ca 2+ -4 is involved in a stabilizing inter- action that is a highly conserved structural feature observed in virtually all CTLD structures. It can be mediated by salt bridge formation between charged groups and by metal ion coordination. In one structure (galactose-specific C-type lectin from rattlesnake Cro- talus atrox (PDB 1jzn, 1muq [65]) Na + was found instead of Ca 2+ in site 4. A stabilizing effect of bound Ca 2+ on CTLD struc- ture has been reported for a number of proteins from different CTLD groups [52,66,67]. Ca 2+ removal greatly increases CTLD susceptibility to proteolysis and changes physical properties of the domain such as circular dichroism spectra and intrinsic tryptophan fluorescence. Structures of the apo forms of human tetranectin [68] and rat MBP-C, and of the one-ion form of rat MBP-A [58], have demonstrated the mech- anism underlying these changes. In these structures compactness of the long loop region is disrupted lead- ing to multiple conformational changes including a cis- trans isomerization of the conserved proline. However, not all CTLDs require Ca 2+ to form a stable long loop region structure. NMR studies of the tunicate CTLD TC14 have shown that its loops maintain its compact fold when Ca 2+ is removed [69]. Role of Ca 2+ in CTLD function The most important functional role of the bound Ca 2+ in CTLDs is monosaccharide binding. This func- tion is limited to site 2 and is discussed in detail in the following section. However, in several cases, which are described below, Ca 2+ -binding sites participate in interactions that do not involve carbohydrate recogni- tion. In proteins, Ca 2+ is found in 7- or 8-coordinated form. Because of the metal’s ability to simultaneously interact with multiple ligands within the protein, its binding can orchestrate dramatic rearrangements in the tertiary structure of the protein. At the same time, the reversible nature of the binding and its dependence on different parameters of the milieu (e.g. ion concen- tration, pH) provide mechanisms to control the struc- tural transformations induced by metal binding. There are several examples of CTLD functions that are mediated by Ca 2+ -induced structural changes, namely the destabilization of the long loop region caused by Ca 2+ removal, rather than its involvement in monosaccharide binding. It is thought that the A. N. Zelensky and J. E. Gready The C-type lectin-like domain superfamily FEBS Journal 272 (2005) 6179–6217 ª 2005 The Authors Journal compilation ª 2005 FEBS 6185 destabilization of the loops caused by pH-induced Ca 2+ loss plays a physiological role in the function of the CTLDs in endocytic proteins such as asialoglyco- protein receptors [52,70] and macrophage mannose receptor [41,59]. Transition of the receptor-ligand com- plex from the cell surface into the acidic environment of a lysosome leads to Ca 2+ loss and to the release of the bound ligand. After release, the ligand is processed by the lysosomal enzymes, while the receptor is recy- cled to the cell surface. Another example of functional CTLD transformation induced by Ca 2+ is human tetranectin. Although in the CTLD of tetranectin Ca 2+ -binding sites 1 and 2 are pre- sent, the CTLD is not known to bind carbohydrates. The domain, however, interacts with several kringle domain-containing proteins, including plasminogen, and the interaction involves several residues from the Ca 2+ -binding site 2. Moreover, the interaction with kringle domain 4 of plasminogen is only possible when Ca 2+ is lost from the binding site [71], which leads to changes in the long loop region conformation similar to those observed in the apo-MBP-C [58,68]. The physiolo- gical role of Ca 2+ as an inhibitor of the tetranec- tin ⁄ plasminogen interaction is, however, unclear. The antifreeze protein (AFP) from Atlantic herring provides an interesting example of a CTLD in which Ca 2+ bound in site 2 is involved in an interaction with a noncarbohydrate ligand [72]. Ewart et al. have shown that not only is the antifreeze activity of the protein Ca 2+ -dependent [73], but that it is disrupted by minor changes in the geometry of the Ca 2+ -binding site 2 introduced by replacing the original galactose- type QPD motif by a mannose-type EPN motif [72]. This strongly suggests that the Ca 2+ site 2 in the herring antifreeze protein interacts directly with the ice crystal altering its growth pattern. Ligand binding CTLDs selectively bind a wide variety of ligands. As the superfamily name suggests, carbohydrates (in var- ious contexts) are primary ligands for CTLDs and the binding is Ca 2+ -dependent [74]. However, the fold has been shown to specifically bind proteins [75], lipids [76] and inorganic compounds including CaCO 3 and ice [72,77–79]. In several cases the domain is multivalent and may bind both protein and sugar [80–82]. Carbohydrate binding is, however, a fundamental function of the superfamily and the best studied one. The first characterized vertebrate CTLDcps were Ca 2+ -dependent lectins, and most of the functionally characterized CTLDcps from lower organisms were isolated because of their sugar-binding activity. Although as the number of CTLDcp sequences grows it becomes clearer that the majority of them do not possess lectin properties, CTLDcps are still regarded as a lectin family (according to Drickamer,  85% of C. elegans and 81% of Drosophila CTLDcps are pre- dicted as noncarbohydrate binding [9]). Unlike many other functions of the CTLDcps, Ca 2+ -dependent car- bohydrate binding is found across the whole phylo- genetic distribution of the family, from sponges to human, and thus is likely to be the ancestral function. Also, Ca 2+ ⁄ carbohydrate-binding CTLDs from differ- ent species demonstrate amazing similarity in the mechanisms of sugar binding. Systematic studies by Drickamer and his colleagues have provided in depth understanding of many aspects of this mechanism. The results of this theoretical and experimental work established a basis for developing bioinformatics tech- niques for predicting CTLD sugar-binding properties with substantial reliability by sequence analysis [83]. Whole-genome studies of the CTLD family published by Drickamer and his colleagues focused on the evolu- tion of the carbohydrate-binding properties and used these prediction methods [6–8]. Although our approach for the Fugu rubripes genome was somewhat different [9], for carbohydrate-binding prediction we used the techniques developed by Drickamer and coworkers. An overview of the literature on the mechanism of Ca 2+ -dependent monosaccharide binding by CTLDs is given next. Ca 2+ -dependent monosaccharide binding The mechanism of Ca 2+ -dependent monosaccharide binding by several CTLDs has been studied in great detail by X-ray crystallography, site-directed muta- genesis and biochemical methods. The first crystallo- graphic study of a complex between a CTLD and a carbohydrate was carried out on rat MBP-A and the N-glycan Man 6 -GalNAc 2 -Asn [51]. In the structure obtained, a ternary complex between the terminal mannose moiety of the oligosaccharide, the Ca 2+ ion bound in site 2 and the protein was observed. The complex is stabilized by a network of coordination and hydrogen bonds: oxygen atoms from 4- and 3- hydrox- yls of the mannose form two coordination bonds with the Ca 2+ ion and four hydrogen bonds with the carbo- nyl sidechains that form the Ca 2+ -binding site 2 (Fig. 5). This bonding pattern is fundamental for CTLD ⁄ Ca 2+ ⁄ monosaccharide complexes, and is observed in all known structures. It is also a major contributor to the binding affinity, especially in CTLDs specific for the mannose group of monosac- charides. For example in MBP-A, mannose atoms The C-type lectin-like domain superfamily A. N. Zelensky and J. E. Gready 6186 FEBS Journal 272 (2005) 6179–6217 ª 2005 The Authors Journal compilation ª 2005 FEBS form very few interactions with the protein other than hydrogen ⁄ coordination bond formation by the two equatorial hydroxyls, and extensive mutagenesis screening has shown that the only other significant contributor to mannose binding is Cb from His189 that forms a hydrophobic interaction with the sugar [84]. The positioning of hydrogen donors and acceptors in the binding sites has two important features. First, it determines the overall positioning and orientation of the ligand in the binding site. It may seem from Fig. 5A that the sugar-binding site of CTLDs has a twofold sym- metry axis relating the sugar hydroxyls, and the hypo- thetical sugar shown can be rotated by 180° without introducing any changes to the bonding scheme. It is now known that this is indeed the case, although some early modeling and mutagenesis studies were based on the assumption that the orientation of the sugar was fixed. However, when the structure of a complex between rat MBP-C with mannose was determined, the orientation of the bound mannose was opposite to the orientation that was observed in MBP-A [85], and fur- ther studies revealed some of the factors that determine the preferred orientation [86]. Although the rat MBPs are the only established example of a CTLD that can bind carbohydrates in both orientations, it is known that different CTLDs bind the same monosaccharide in different orientations (e.g. galactose-binding MBP-A mutant and CEL-I vs. TC-14 lectin). The second constraint imposed by the Ca 2+ -coordi- nation site on the ligand determines the properties of the carbohydrate hydroxyls that the site can accept, and this is best demonstrated by the mechanism of dis- crimination between the mannose group of monosac- charides and the galactose group of monosaccharides by CTLDs. As noted previously, early in the history of CTLDs an important correlation between the residues flanking the conserved cis-proline in the long loop region, which are involved in Ca 2+ -binding site 2 for- mation, and the specificity for either galactose or man- nose was made. In all mannose-binding proteins known at that time, the sequence of the motif was EPN (E185 and N187 in MBP-A), while in the galac- tose-specific CTLDs it was QPD. In a series of elegant mutagenesis experiments Drickamer and coworkers have shown that replacing the EPN sequence in MBP-A with a galactose-type QPD sequence was enough to switch the specificity to galactose [87], and that further modifications around the binding site (mainly intro- duction of a properly positioned aromatic ring to form a hydrophobic interaction with the apolar face of the sugar) can increase the affinity and specificity of the mutant MBP-A for galactose to the level observed in natural galactose-binding CTLDs [88]. Crystallographic analysis of the galactose-specific MBP-A mutant showed that the EPN to QPD change does not cause any serious restructuring of the Ca 2+ - binding site 2 geometry [89]; this suggested that the key switch in the specificity was induced by swapping the hydrogen-bond donor and acceptor across the monosaccharide-binding plane and changing the hydrogen-bonding pattern from the mannose-type Coordination bond H-bond (don–>acc) A B Asn187 Glu185 Asn205 Asn206 G lu193 Fig. 5. Ca 2+ -dependent monosaccharide binding by CTLDs. (A) A schematic representation of a Ca 2+ -hexose-CTLD complex. Two hydroxyl oxygens and the ring of the hexose are shown. The Ca 2+ atom is shown as a large grey sphere, and oxygens as empty circles and ovals. Protein groups that act as hydrogen donors and acceptors are not shown. Arrows show the direction of hydrogen bonds in mannose-specific CTLDs, while light-grey arrows indicate changed directions in galactose-specific CTLDs. (B) A stereoview of the MBP-A complex with man- nose (PDB 2msb). Coordination bonds are orange. Hydrogen bonds where sugar hydroxyl acts as acceptor and donor are red and blue, respectively. The Ca 2+ atom is shown as a blue sphere. A. N. Zelensky and J. E. Gready The C-type lectin-like domain superfamily FEBS Journal 272 (2005) 6179–6217 ª 2005 The Authors Journal compilation ª 2005 FEBS 6187 asymmetrical (Fig. 5A, dark-grey arrows) to galactose- type symmetrical (Fig. 5A, light-grey arrows). The same distribution of hydrogen-bonding partners was observed in the galactose-binding lectin TC-14 from the tunicate Polyandrocarpa misakiensis [64]. The TC-14 CTLD contains an unusual EPS motif in the long loop region, which is similar to the motifs of the mannose-binding proteins but contains a serine as a hydrogen-bond donor instead of the asparagine in MBP-A. The crystal structure revealed that due to a compensatory change on the opposite side of the ligand-binding site (the ‘WND’ motif is changed to LDD), and a 180° rotation of the galactose residue compared with the orientation observed in the galac- tose-binding MBP-A mutant, the symmetrical pattern of the hydrogen bonding is maintained. Although many of the determinants of the monosac- charide-binding specificity have been established experi- mentally, the mechanism underlying them is still unclear. Mutual spatial disposition of bonded hydrox- yls, which was initially suggested to be the main contri- butor to the specificity, is no longer considered so important; a growing number of crystal structures of CTLDs with the MBP-A-like (‘asymmetrical’) distribu- tion of hydrogen-bond donors and acceptors have shown that the core binding site is compatible not only with any two equatorial hydroxyl (3- and 4-OH of man- nose and glucose, 2- and 3-OH of fucose), but also with a combination of axial and equatorial hydroxyls (3- and 4-OH of fucose, as in E- and P-selectin structures). A comparative study of different lectin-carbohydrate com- plexes published by Elgavish and Shaanan [90] suggests that additional stereochemical factors need to be taken into consideration. Elgavish and Shaanan noted the unique clustering of hydrogen-bond donors and accep- tors around the 4-OH hydroxyl in all compared struc- tures, which was not observed for other hydroxyls: in a Newman projection along the O4-C4 bond, hydrogen bond acceptors are never gauche to both vicinal ring carbons (C3 and C5), and thus the 4-OH proton is always pointing outside the ring. Poget et al. [64] confirmed this observation and also noted that in CTLDs the same rule is also true for the 3-OH proton. However, no explanation of the unique stereochemistry of the 4-OH binding orientation has been offered. Other contributions to monosaccharide binding affinity and specificity Although the networks of interactions between the Ca 2+ ion, the carbonyl residues that coordinate it and the sugar hydroxyls determines the basic binding affin- ity and specificity to either mannose-type or galactose- type monosaccharides, other structural elements in the binding sites increase the affinity to the level required for efficient binding, impose steric limitations on the orientation of the ligand and introduce selectivity to the particular members within the mannose or galac- tose groups. Structural determinants of specificity for particular monosaccharides from both mannose and galactose groups were studied by protein engineering on the MBP framework [91,92] and by mutagenesis of several wild-type proteins (mechanisms of discrimination between Glc and GlcNAc by chicken hepatic lectin [93], contribution of His189 to the mannose-binding affinity in MBP-A [84], mutations affecting MBP-A binding of mannose [94], discrimination between Gal- NAc and Gal by ASGPR [95], increasing the mutant MBP-A affinity towards galactose [88], role of van der Waals interaction with Val351 in fucose recognition by human DC-SIGN [96] and residues affecting pH-dependent ligand release by ASGPR [70]). These additional contributors to binding, however, are vari- able even between close homologues, which combined with the inherent plasticity of the core binding site makes any predictive modeling questionable. Reliability of Ca 2+ /carbohydrate-binding prediction As noted above, the molecular mechanism of Ca 2+ - dependent carbohydrate binding is conserved in all family members studied; the amino acids that form the core of the binding sites form characteristic motifs (‘EPN’ and ‘WND’) that can be identified by sequence similarity and are indicative of the binding specificity (mannose vs. galactose). These observations provide a simple and very popular approach to predicting whe- ther a CTLD of unknown function is likely to bind sugar (‘EPN’ and ‘WND’ present) and whether it would preferentially bind mannose- or galactose-type ligands (‘EPN’ vs. ‘QPD’). This simple prediction tech- nique is widely used and has proven to be reliable in many cases. However, its development was based on comparison of a limited set of well-characterized domains, whereas the number of uncharacterized sequences to which it is applied is quickly growing, as does also the evolutionary distance between the char- acterized and new sequences. It is therefore important, especially for studies involving large-scale CTLD sequence analysis, to take into account the assump- tions on which this approach is based, and its possible limitations. The three main assumptions are: (a) the presence of Ca 2+ -binding site 2 strongly suggests sugar-binding The C-type lectin-like domain superfamily A. N. Zelensky and J. E. Gready 6188 FEBS Journal 272 (2005) 6179–6217 ª 2005 The Authors Journal compilation ª 2005 FEBS [...]... fold, then determining their structures would significantly extend our understanding of the ‘anatomy’ of the domain On the other hand, there is at least one highly conserved position, the glycine from the WIGL motif, that so far cannot be ascribed any structural role by comparing the X-ray structures alone [5] Given the very high conservation of this position, FEBS Journal 272 (2005) 6179–6217 ª 2005 The. .. VII has been published The domain architecture FEBS Journal 272 (2005) 6179–6217 ª 2005 The Authors Journal compilation ª 2005 FEBS 6189 The C-type lectin-like domain superfamily A N Zelensky and J E Gready Fig 6 Domain architecture of vertebrate CTLDcps, with mammalian homologues, from different groups Group numbers are indicated next to the domain charts I –lecticans, II – the ASGR group, III – collectins,... The C-type lectin-like domain superfamily Link domain or protein tandem repeat (PTR) is a special variety of CTLD, which lacks the long loop region The major function of Link domains is binding hyaluronan Although proteins containing it have different domain architecture, their number is small, and they have not been divided into subgroups Group I CTLDcps contain both canonical and Link-type CTLDs Other... indeed their main function, PLIs are the third group of CTLDcps that independently evolved to support a newly acquired FEBS Journal 272 (2005) 6179–6217 ª 2005 The Authors Journal compilation ª 2005 FEBS 6201 The C-type lectin-like domain superfamily A N Zelensky and J E Gready clade-specific function The abundance and functional diversity of the CTLDcps from these subgroups provides a good example of the. .. site that has significant affinity only for larger ligands [85] Interestingly, the monosaccharide bound at the alternative site is in contact with the regions corresponding to the regions labeled in the acorn barnacle lectin study (d) Although the CTLD of human thrombomodulin does not contain the The C-type lectin-like domain superfamily typical Ca-binding sequence signature, aggregation of melanoma cells... melanogaster and C elegans) showed that the CTLDcp repertoire of these species is drastically different from each other, and from the known vertebrate groups On the other hand, the superfamily has undergone very few changes in the 450 million years of vertebrate radiation [9] These observations have important implications for understanding of the origins and evolution of the functional systems CTLDcps are... Ig-like CTLD Fig 7 Domain architecture of the proteins containing Link domain First, it is not absolutely clear what the classification is based on – phylogenetic relationships between CTLDs, or the domain architecture of the proteins containing the CTLDs Although the latter is generally considered to be the case, even Drickamer’s [2] initial grouping contained a set of CTLDcps with identical domain architecture... Table 1 (Continued) A N Zelensky and J E Gready The C-type lectin-like domain superfamily 6193 6194 Type I transmembrane proteins with an N-terminal ricin-like domain, a fibronectin type 2 domain and 8 or 10 (Dec205) CTLDs in the extracellular domain, and a short cytoplasmic domain Multi-CTLD endocytic receptors X-ray structures of NKG-2D (human and mouse) with their MHC-like ligands [197,198], Ly49C [199],... low density lipoprotein receptor (expressed on endothelial cells) or Dectin-1 (macrophages, neutrophils, dendritic FEBS Journal 272 (2005) 6179–6217 ª 2005 The Authors Journal compilation ª 2005 FEBS 6199 The C-type lectin-like domain superfamily A N Zelensky and J E Gready cells) Also, there is no obvious distinction between domain structures of the group members which could be used as a basis for... vertebrate CTLDcps is the so-called snake venom CTLDs Based on their domain architecture (single CTLD with no other domains), these proteins are normally assigned to group VII However phylogenetic analysis performed by others [277,278] and ourselves shows that the snake venom CTLD group is phylogenetically heterogeneous, and that its members do not have orthologues among the members of the mammalian CTLD . common meaning of the C-type lectin-like domain does not include these domains [1,6]. Here we will use the term C-type lectin-like domain (CTLD) in its broadest definition to refer to protein domains. and the group of pro- teins containing the domain is still often called the C-type lectin family’ or C-type lectins’, although most of them are not in fact lectins. The abbreviation CRD The C-type. N-terminal ricin-like domain, a fibronectin type 2 domain and 8 or 10 (Dec205) CTLDs in the extra- cellular domain, and a short cytoplasmic domain. The C-type lectin-like domain superfamily A. N.

Ngày đăng: 30/03/2014, 11:20

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan