Báo cáo Y học: Identification of novel membrane proteins by searching for patterns in hydropathy profiles potx

7 407 0
Báo cáo Y học: Identification of novel membrane proteins by searching for patterns in hydropathy profiles potx

Đang tải... (xem toàn văn)

Thông tin tài liệu

Identification of novel membrane proteins by searching for patterns in hydropathy profiles John D. Clements and Rowena E. Martin School of Biochemistry and Molecular Biology, Australian National University, Canberra, Australia A technique has been developed to search a proteome database for new members of a functional class of mem- brane protein. It takes advantage of the highly conserved secondary structure of functionally related membrane proteins. Such proteins typically have the same number of transmembrane domains located at similar relative positions in their polypeptide sequence. This gives rise to a charac- teristic pattern of peaks in their hydropathy profiles. To conduct a search, each member of a polypeptide database is converted to a hydropathy profile, peaks are automatically detected, and the pattern of peaks is compared with a tem- plate. A template was designed for the acetylcholine (ACh) and glycine receptors of the cys-loop receptor superfamily. The key feature was a closely spaced triplet of hydropathy peaks bracketed by deep valleys. When applied to the human proteome the search procedure retrieved 153 profiles with a receptor-like triplet of peaks. The approach was highly selective with 70% of the retrieved profiles annotated as known or putative receptors. These included ACh, glycine, c-amino butyric acid and seretonin receptors, which are all related by sequence. However, ionotropic glutamate recep- tors, which have almost no sequence homology with ACh receptors, were also retrieved. Thus, the strategy can find members of a functional class that cannot be identified by sequence alignment. To demonstrate that the strategy can easily be extended to other membrane protein families, a template was developed for the neurotransmitter/Na + symporter family, and similar results were obtained. This approach should prove a useful adjunct to sequence-based retrieval tools when searching for novel membrane proteins. Keywords: hydropathy profile; integral membrane protein; ligand-gated channel; neurotransmitter receptor; proteo- mics; transporter. Integral membrane proteins are responsible for the majority of interactions between a cell and its external environment. Approximately 20% of the genes in animal, plant, yeast and bacteria genomes encode integral membrane proteins, consistent with their fundamental importance to cellular function [1–3]. Transmembrane a-helices are encoded by a long stretch of predominantly hydrophobic residues (typic- ally15–19), which is sufficient to cross the hydrophobic region of the membrane bilayer (2.5 nm) [4]. The pronounced compositional bias arises because these residues must be capable of hydrophobic interactions with the lipid environment in the interior of the membrane. Most membrane-associated domains produce an easily identified peak in the hydropathy profile of the polypeptide. Standard software tools are available that can identify the putative transmembrane domains of a membrane protein based on its hydropathy profile [5,6]. Sophisticated algorithms that combine hydropathy and sequence analysis can predict up to 95% of transmembrane helices [7–12], but simple hydropathy peak detection strategies are also very effective [13]. The primary function of most membrane proteins is to transfer molecules, ions or signals between the exterior and interior of a cell, or subcellular compartment, and trans- membrane domains provide the physical conduit for the transfer. Typically, several transmembrane domains com- bine to form a tightly coupled structure that is intimately involved in the function of the protein [14]. It follows that the number and the pattern of transmembrane domains will be strongly conserved within a functionally related family. Protein families within which secondary structure is highly conserved include neurotransmitter receptors, voltage-gated channels, connexins and transporters (Fig. 1). The majority of neurotransmitter-activated channels can be assigned either to the glutamate cationic receptor (iGluR) superfamily, or the cys-loop receptor superfamily, which includes acetylcholine (ACh), glycine, c-amino butyric acid (GABA) and serotonin receptors [15]. Channels from both superfamilies are formed from subunits that have four membrane-associated domains. These four domains are organized as a cluster of three closely spaced domains near the centre of the polypeptide, and a fourth well separated domain close to the C-terminal end of the polypeptide (Fig. 1A) [14]. Despite the similarity of their secondary structure, there is almost no sequence homology between the two superfamilies. Neuronal voltage-gated Na + ,Ca 2+ and K + channel families diverged from a common ancestor long ago and there is very little sequence homology between the families, yet all three have retained a similar secondary structure. Correspondence to J. Clements, School of Biochemistry and Molecular Biology, Australian National University, Canberra, ACT 0200, Australia. Fax: + 61 26125 0313, Tel.: + 61 26125 3465, E-mail: John.Clements@anu.edu.au Abbreviations: ACh, acetylcholine; AchR, acetylcholine receptor; GABA, c-amino butyric acid; HU, hydrophobicity unit; LGIC, ligand-gated ion channels; iGluR, glutamate cationic receptor; NMDA, N-methyl- D -aspartate; AMPA, a-amino-3-hydroxy-5- methyl-4-isoxazole propionate; GlyR, glycine receptor; NSS, neurotransmitter/Na + symporter. (Received 31 December 2001, revised 18 February 2002, accepted 27 February 2002) Eur. J. Biochem. 269, 2101–2107 (2002) Ó FEBS 2002 doi:10.1046/j.1432-1033.2002.02859.x They are formed from four subunits, each containing six membrane-associated domains (Fig. 1B) [14]. In voltage- gated Na + and Ca 2+ channels the four subunits are linked together as a single protein with a series of internal repeats. In the case of voltage-gated K + channels the subunits are expressed as separate proteins, and the channel forms as a tetramer of these subunits (Fig. 1B) [14]. Two separate families of membrane proteins form gap- junctions between mammalian cells (connexins), and between invertebrate cells (innexins). There is negligible sequence homology between these families, but they share a similar secondary structure. Subunits of both connexins and innexins contain four transmembrane domains, and com- bine to form dodecamers [14,16–19]. In contrast to ligand- gated channels, the four transmembrane domains of connexin and innexin are organized into two closely spaced pairs, which are separated by an intracellular hydrophilic loop (Fig. 2D). Many other functionally related protein families have been identified where secondary structural features are better conserved than the underlying amino acid sequences [20,21]. Despite clear evidence for conservation of secondary structure, little systematic use has been made of structural information in proteomic analysis. Most genomic software Fig. 1. Schematic diagram showing that the pattern of transmembrane domains is conserved within a functional class of membrane protein. (A) LGICs typically have a closely spaced cluster of three transmem- brane domains (dark bars) and a fourth well-separated domain. This secondary structure is conserved across the cys-loop superfamily and the iGluR superfamily, even though there is no sequence homology between these families. Selected subunits from both families are shown. (B) Distantly related voltage-gated channels also exhibit a character- istic pattern of transmembrane domains. Channels are formed by four groups of six transmembrane domains. Within each group, the first five transmembrane domains are closely spaced, with the sixth domain separated by a relatively long extracellular loop. Fig. 2. The highly conserved secondary structure of LGICs is reflected in a characteristic pattern of peaks in their hydropathy profiles. (A) The hydropathy profile of the human AChR alpha-1 subunit reveals a typical cluster of three peaks bracketed by deep valleys. The peak, base and valley threshold levels used by the search algorithm are shown as horizontal dashed lines. Peaks located at < 20 residues are likely to be a cleaved signal sequences and are ignored. (B,C) A similar pattern of peaks and valleys is seen in the profiles of the GABA A receptor alpha-1 subunit and glutamate receptor GluR1 subunit. (D) A human conn- exin subunit also exhibits four hydropathy peaks, but they are organized in a different pattern. The peaks occur in two pairs separated by a deep valley. 2102 J. D. Clements and R. E. Martin (Eur. J. Biochem. 269) Ó FEBS 2002 packages can generate a hydropathy profile from an amino- acid sequence, but in general they only permit one or a few profiles to be generated at a time. The resulting hydropathy profiles are typically examined by eye for significant features. Efforts have been made to improve and automate this process. For example, the web-based programs TMPRED , TMHMM and MEMSTAT identify and count putative transmembrane helices, and suggest their orientation in the membrane [7–11]. These programs are effective when applied to individual amino acid sequences, but no software tools are available to automatically analyse the pattern of putative transmembrane domains (secondary structure). A method for alignment of hydropathy profiles has been developed [20,21], and an experimental web-based server uses this approach to align pairs of sequences submitted by the user, or to search a database for hydropathy profiles that match a submitted sequence (Bioinformatics Unit, Weizmann Institute of Science). At present, it is limited to the SwissProt database, and to Hopp–Woods, or Kyte– Doolittle hydrophobicity scales. In principle, this approach can be used to search for proteins with conserved secondary structure, but there are technical issues that limit its performance. For example, a profile with a similar pattern of peaks, but differently shaped peaks and valleys may be missed. It is equally sensitive to mismatches in both peak (transmembrane) and valley (intra- and extracellular loop) regions, even though evolutionary changes in valley shape will have relatively little effect on secondary structure. In this paper we develop and test a new automated proteome search technique. Every member of a polypeptide database is converted to a hydropathy profile, hydropathy peaks are automatically detected, and the pattern of peaks is compared with a template. Sequences that match the template are output to a new database, and their profiles are displayed in a convenient format. This approach can be used to search for new members of a family or functional class of membrane protein. It can assist with functional analysis, and may also be useful in proteome database annotation. METHODS An algorithm was developed for searching a large polypep- tide sequence database for proteins that are likely to be new members of a functionally related family of membrane proteins. The program runs on a personal computer, and the analysis of an organism’s total proteome takes about 1 min. The test is applied to the hydropathy profile of each sequence. A standard (Kyte–Doolittle) algorithm [5,6] is used to convert a sequence into a profile. The amino acids are each assigned a hydropathy value based on experimental measures, and the resulting profile is filtered to reduce noise. We chose a set of hydropathy values and a filter width that are near-optimal for detection of transmembrane regions [6]. The filter function is a rectangular averaging window (box-car filter) with a length of 17 amino acid residues. With these settings, the amplitude of the peak produced by a transmembrane a-helix is typically in the range 1–3 hydro- phobicity units (HU) (Fig. 2). For example, the four transmembrane domains are clearly visible in the hydro- pathy profiles of three different ligand-gated ion channels (LGICs) (Fig. 2A–C) and the connexin alpha-1 subunit (Fig. 2D). Peak detection Each polypeptide sequence in a database is subject to a series of three tests. The first test simply rejects the sequence if it is too short or too long. The range of acceptable lengths is determined from known members of the membrane protein family, but this restriction can be relaxed if necessary. Membrane proteins always have both hydrophobic and hydrophilic regions, so profiles that do not cross both an upper and lower threshold are also rejected. These thresholds are the same as those used for peak detection (Fig. 2). Next, a simple peak-detection procedure is applied to each hydropathy profile, resulting in an estimate of the number and the locations of putative transmembrane helices. The algorithm identifies a peak when the profile rises from below a base threshold, crosses above a peak detection threshold, then crosses back below both the peak and base thresholds. In Fig. 2, the peak and base thresholds are indicated with the upper two dashed lines. 1 Different threshold settings are used depending on the target protein. For example, the base threshold selected for LGICs is higher than for connexins (Figs 2A–D). The location and amplitude of each peak is measured at the maximum point between the two peak threshold crossings. The width of each peak is measured between the two base threshold crossings. This gives a more consistent result than measuring the width at the peak threshold level. The location and amplitude of each valley minimum is also measured. Comparing a profile to a template After the peaks and valleys are identified, a test is performed to determine whether they conform to a template. The simplest test is to count the peaks and ask whether this number falls within a specified range. The peak count may be adjusted by rejecting narrow peaks, or by counting a broad peak as two merged peaks. For example, when the base threshold is set below zero, the majority of transmembrane regions will produce a peak that is wider than 10 residues. If the width of a peak is > 30 residues it is possible that two or more closely spaced transmembrane regions have produced a single peak in the hydropathy profile. A peak located within the first 20 residues is likely to be a cleaved signal sequence (destined in most cases to be cleaved from the mature protein), and can optionally be removed from the peak count (Fig. 2A). Sometimes a false hydropathy peak is detected at a location that is not a transmembrane domain, and true transmembrane peaks are occasionally missed. Thus, when searching for proteins with four transmembrane domains, a profile with three to five peaks would typically be accepted. If the number of peaks falls within the specified range, then more sophisticated template-matching tests can be applied. For example, the separation between adjacent peaks (interpeak intervals) can be calculated. A candidate profile can be rejected if the interpeak intervals fall outside the specified ranges. Another strategy is to scan for a particular feature, such as a closely spaced cluster of peaks bracketed by deep valleys. A strategy of this type is developed below for detecting ligand-gated ion channels. Ó FEBS 2002 Hydropathy profile search (Eur. J. Biochem. 269) 2103 Designing and refining a template When designing a search strategy, the peak detection thresholds and the selection parameters are adjusted with the dual goals of maximizing detection and minimizing false-positives. The first goal is achieved by applying the algorithm to a sequence database containing all proteins that belong to the family of interest. The parameters are refined by trial and error until almost all members of the family are selected. Next the same set of search parameters is applied to a database containing unrelated membrane protein sequences. If necessary, the parameters are fine- tuned until all members of the unrelated family are rejected. Finally, the search procedure is applied to a large database, for example one containing the proteome of an organism. The search algorithm and several related utilities were written using a development environment that is built into AxoGraph (Axon Instruments, CA), a scientific data analysis and graphics program for Macintosh com- puters (http://www.axon.com/CN_AxoGraph4.html). The AxoGraph plug-in programs that implement the search algorithm are available on request, or from http:// johnc3.anu.edu.au/proteomic_plugins.sea. AxoGraph was chosen for this study because it can plot and overlay several thousand hydropathy profiles in a single window, and analyse them in a single operation. It also has convenient features for browsing and organizing the large number of profiles generated by the search algorithm. RESULTS A search strategy was designed for LGICs. The strategy was refined by applying it to custom polypeptide databases, and tested by applying it to a database containing the complete human proteome. This database was chosen because it is well annotated, which aids in the assessment of the algorithm’s performance. The results presented below are essentially a proof of concept. In general, this technique will be more useful when applied to a database that is not complete or well annotated. Search strategy for LGICs The following procedure was used to develop the search strategy for LGICs. First, a custom database containing two members of the cys 2 -loop receptor superfamily was constructed. ACh receptors (AChRs) and glycine recep- tors (GlyRs) were selected using a text search of the Entrez database. Truncated sequences, duplicate sequenc- es and sequences that were not LGICs were removed manually. This left 119 unique, full-length sequences from many different animal species (including human, chicken, frog, fish, locust, fruit-fly and nematode); these were converted to hydropathy profiles in AxoGraph. Features common to all of the profiles were identified by eye. AxoGraph’s convenient browsing features aided in this task. Every profile had a cluster of three peaks located approximately 200–300 residues from the start of the sequence (Fig. 2A). Each of the three peaks had an amplitude of 1–2 HU, and the cluster of peaks was bracketed with deep valleys extending below )2.5 HU. The cluster of three peaks was followed by a fourth peak close to the end of the profile. Based on these observations, and following a period of trial-and-error refinement, the following selection criteria were chosen. Only sequences with lengths between 300 and 1800 were accepted. A peak threshold of 1.1 HU and a base threshold of 0.8 HU reliably detected all four peaks in every profile. However, some of the peaks were measured as very narrow (only two residues) because the base threshold was set relatively high. Therefore, narrow peaks were not rejected. A putative transmembrane domain occasionally appeared as two narrow peaks. Therefore, a pair of peaks separated by fewer than six residues were counted as a single peak. We noted that the first and last peaks in the characteristic cluster of peaks were separated by between 55 and 66 residues. Thus, the template criterion for a LGIC was the presence of a cluster of three peaks separated by between 50 and 75 residues, bounded by deep valleys of < )2.5 HU. The cluster had to be followed by at least one additional peak, but no more than three peaks. Testing the LGIC search strategy A search of the AChR and GlyR database using the above detection criteria correctly retrieved every one of the 119 profiles. Thus, the search strategy exhibits excellent sensi- tivity, as it was able to detect 100% of known GlyR and AChR across a range of species. The accuracy and sensitivity of the search strategy were tested by applying it to a custom database containing GABA A receptor sequences retrieved via a text search of the Entrez database. GABA A receptors are also members of the cys-loop superfamily, but they were not used during the selection and tuning of the search parameters. The algo- rithm retrieved 39 out of 41 sequences (95%), demonstra- ting excellent sensitivity for proteins that are related in both function and sequence to the target group. Next, the selectivity of the search strategy was examined. We chose two families of integral membrane proteins which are functionally distinct from LGICs, but which also have four transmembrane domains. A custom database of known and putative connexins and innexins was construc- ted using a series of text searches of the Entrez database. The search algorithm was applied to the database and retrieved only one out of 122 sequences. Thus, the LGIC search strategy exhibits good selectivity. The entire human proteome (Entrez) was searched and 153 profiles with a receptor-like triplet of peaks were retrieved. Of these, 105 (70%) were annotated as known or putative receptors. As expected, many of these were GlyR or AChR (31). Other members of the cys-loop superfamily were also identified, including receptors for GABA (18) and seretonin (5). Of particular note, 13 members of the iGluR superfamily were also retrieved, including the N-methyl- D -aspartate (NMDA) and kainate receptor subtypes. Thus, the search algorithm succeeded in its central goal of identifying proteins that were functionally related to the target group (GlyR and AChR), but were not related by sequence homology. Of the profiles that were not annotated as receptors, six were voltage-gated potassium channels and two were transporters. They were retrieved because they contained six or seven transmembrane domains, three of which formed a cluster separated by deep valleys (Fig. 3A). It was noted that the valleys between the triplet peaks were usually 2104 J. D. Clements and R. E. Martin (Eur. J. Biochem. 269) Ó FEBS 2002 deeper for potassium channels and transporters than for LGICs. The receptor detection algorithm was refined to eliminate profiles where the deeper of the two valleys between the triplet peaks extended below )1.5 HU. This refined algorithm was still able to detect 99% of known GlyR and AChR. It retrieved 87 profiles from the human proteome, of which 90% were receptors. Although this refined search procedure increased the selectivity for recep- tors, it also failed to retrieve any iGluRs. This illustrates the inevitable trade-off between the selectivity of the search algorithm and the likelihood of detecting distantly related functional homologues. The search strategy’s sensitivity to membrane proteins that were related to the target group by function but not by sequence, was investigated further. A custom database containing 84 sequences from the iGluR superfamily was constructed using Entrez. It included the NMDA, kainate and a-amino-3-hydroxy-5-methyl-4-isoxazole propionate (AMPA) receptor subtypes. These receptors are function- ally related to GlyRs and AChRs, but share almost no sequence homology. Also, iGluRs are thought to form tetrameric channels, in contrast with the cys-loop super- family that forms pentameric channels. Despite these differences, the search algorithm retrieved 30 sequences (36%) from the iGluR database. By subtype, 90% of the kainate receptors in the database were detected, but only 36% of the NMDA receptors, and 1% of the AMPA receptors. Examination of the AMPA receptor hydropathy profiles revealed that the peak associated with their second membrane-associated domain did not reach the peak threshold in most cases. A small reduction in this threshold would have resulted in many more AMPA and NMDA receptors being retrieved. Nevertheless, these results dem- onstrate the remarkable sensitivity of the original search strategy for membrane proteins that are related to AChRs only by function. Candidate LGICs retrieved by the search strategy Four proteins with receptor-like profiles from the second search were annotated as having no known or putative function. In principle, these could be novel receptors, so we examined them in greater detail. The profile with accession number AAF86374 is a member of the ancient conserved domain protein family (ACDP), which has sequence elements conserved from nematode to human. Intriguingly, its secondary structure is very similar to that of a LGIC, with a clear triplet of peaks followed by a well-separated fourth peak (Fig. 3B). It has a shorter section preceding the triplet than a typical receptor, but it is reasonable to speculate that it is membrane protein, and possibly an ancient ion channel or receptor. The next two profiles came from an uncharacterized membrane protein expressed in the hypothalamus (accession numbers NP_060945 and AAG09678). These proteins had six or possibly seven transmembrane domains and are unlikely to be receptors, but could be novel transporters or voltage-gated channel subunits (Fig. 3C). The profile BAA18909 is simply anno- tated ÔunknownÕ, but a BLAST search revealed weak homol- ogy with a section of an intrinsic factor-vitamin B12 receptor. The profile is quite similar to a typical LGIC, although a small narrow peak precedes the main triplet (Fig. 3D). These findings demonstrate how the hydropathy Fig. 3. Hydropathy profiles of four proteins that were retrieved from the human proteome by a search strategy designed to detect LGICs, but were not annotated as receptors. (A) A voltage-gated potassium channel was incorrectly retrieved because its first two hydropathy peaks fell just below the detection threshold. Potassium channels typically have a cluster of five peaks followed but a sixth well-separated peak. Note that although only one peak following the valley is highlighted, the tem- plate will accept up to three peaks. (B) An ancient conserved domain protein with no known function was retrieved because of its receptor- like cluster of three transmembrane peaks bracketed by deep valleys. The separation between the cluster and the fourth peak was larger than for a typical LGIC, but otherwise the secondary structure is strikingly similar. (C) An uncharacterized hypothalamus protein is unlikely to be a LGIC, despite the fact that it is expressed in a brain region. It has two or three extra peaks before and after the triplet, giving it a secondary structure that has more in common with a voltage-gated channel or a transporter. (D) A retrieved protein that was simply annotated ÔunknownÕ, but which has weak sequence homology with an intrinsic factor-vitamin B12 receptor. Ó FEBS 2002 Hydropathy profile search (Eur. J. Biochem. 269) 2105 peak detection algorithm may be used to search for truly novel members of a functional class of membrane proteins. Search strategy for neurotransmitter/Na + symporters To demonstrate that our approach can be applied to other functional classes of membrane protein, we developed a search strategy for the neurotransmitter/Na + symporter (NSS) family. A custom database was constructed contain- ing 40 GABA and dopamine transporters, which have 10– 12 putative transmembrane domains. The corresponding peaks in the transporter profiles could be detected using a peak threshold of 1.4 and a base threshold of 0.6. The minimum peak width was set to 10, and peaks with a width of up to 60 residues were accepted. Profiles were accepted only if they had between 10 and 13 peaks, arranged as a pair of peaks, followed by a deep valley (< )1.9), then a cluster of 8–11 peaks, extending over no more than 300 residues (Fig. 4A,B). It is likely that the initial pair of peaks actually represents three transmembrane domains. The second peak was typically 40 residues in width, and is probably produced by two closely spaced transmembrane domains. This search strategy identified all 40 of the targeted NSS transporter profiles. The entire human proteome (Entrez) was searched and 59 profiles with an NSS transporter-like pattern of peaks were retrieved. Of these, 51 were annotated as known or putative transporters (86%). As expected, many of these were NSS transporters (54%), but several other transporters were also identified, including Na + /Ca 2+ antiporters (9%), Na + / glucose symporters (7%), K + /Cl ) symporters (5%), Na + / nucleoside transporters (3%), and organic ion transporters (3%) (Fig. 4C). Thus, the search algorithm again succeeded in identifying proteins that were functionally related to the target group, but were not related by sequence homology. DISCUSSION We have developed and tested an algorithm that can scan a large polypeptide database, and retrieve membrane proteins on the basis of secondary structure rather than sequence homology. The algorithm locates putative transmembrane domains in each sequence, and tests whether their spatial pattern matches a template. In the past this process has been performed manually, by visual inspection of hydropathy plots generated one at a time. Our major innovation was to automate the process, and apply it on the proteome scale. A computer program performs the peak detection and tem- plate matching. The complete proteome of an organism can be scanned in about 1 min using a desktop personal computer. This represents a qualitative increase in the power of the technique, and it permits new questions to be addressed. An analogy may be drawn with modern sequence-based search programs, such as BLAST ,which can scan multiple genomes. Although it was directly based on earlier sequence analysis programs that could align small groups of sequences, its development opened an entirely new field. In principle, our technique could be extended by complementing hydropathy peak detection with a more sophisticated analysis of the underlying sequence [8–12]. Several web-based programs use such an approach to improve the reliability with which transmembrane domains can be identified, and to predict topology. Incorporating additional sequence analysis into our technique would permit an orientation to be assigned to each transmembrane a-helix, which would assist structural analysis. However, the additional processing would substantially slow the search run, and it unclear how much improvement would be achieved in practice. A recent study evaluated all of the current methods for predicting transmembrane domains, and found TMHMM to be the best performing program [13]. However, the standard Kyte–Doolittle algorithm, which forms the basis of our search technique, was a close runner- up. Some membrane proteins incorporate a hydrophobic pore-lining region that does not cross the membrane, but instead forms a beta hairpin structure that dips into the membrane then re-emerges on the same side [22]. These membrane-associated domains represent an important component of the highly conserved secondary structure Fig. 4. The conserved secondary structure of neurotransmitter/Na + symporters is reflected in a characteristic pattern of peaks in their hydropathy profiles. (A) The hydropathy profile of a rat dopamine symporter reveals a pair of peaks followed by a deep valley, then a cluster of nine peaks. The peak, base and valley threshold levels used by the search algorithm are shown as horizontal dashed lines. (B) A similar pattern of peaks and valleys is seen in the profile of a closely related rat GABA symporter. (C) A human Na + -independent organic anion transporter retrieved by the NSS symporter template exhibits a similar pattern of peaks, although it has no sequence homology with the neurotransmitter symporters. 2106 J. D. Clements and R. E. Martin (Eur. J. Biochem. 269) Ó FEBS 2002 of voltage-gated potassium channels, and similar hairpin structures may also be present in other membrane proteins [22]. A sophisticated a-helix-detection algorithm may reject or misinterpret such regions. Our approach is loosely analogous with a strategy that uses alignment of hydropathy profiles to search for conserved secondary structural features in polypeptide sequences [20,21]. This alignment technique is based on the same algorithm that is used in standard peptide and nucleotide sequence alignment, but is applied to sequences of hydropathy values. Profile alignment will generally provide a more stringent test for conserved structure than our template-matching approach. However, a more strin- gent test will be less likely to detect unusual or distantly related family members. For example, a LGIC containing a triplet of unusually high hydropathy peaks will be reliably detected by our approach, but will receive a low score in an alignment-based search. Another problematic issue for the alignment algorithm is what penalty should be assigned when introducing gaps into one or both profiles, and how this penalty should be weighted for transmembrane domains vs. extra-membrane loops. We tested the performance of the hydropathy alignment approach by submitting the sequence of the GlyR alpha-1 subunit to the web-based search engine http:// bioinformatics.weizmann.ac.il/hydroph/, and analysing the first 200 sequences retrieved from the SwissProt database. Only 43% of these sequences were annotated as receptors, and all were close relatives of AChR (ACh, glycine and GABA receptors). No receptors for seretonin or glutamate were identified. Thus, hydropathy alignment is much less sensitive to distantly related functional homologues, and less selective for the membrane protein family of interest than the template matching approach. We chose the human genome to test our search strategy, because the thorough annotations permitted a detailed assessment of the algorithm’s performance. In practice, the hydropathy profile search tool will be more useful when applied to an actively growing proteome database that is not yet well annotated. The most important use for the technique will be to search for new members of established functional families of membrane proteins, especially those that are missed by standard sequence-based search tech- niques. We have demonstrated how this can be achieved for LGICs, and for neurotransmitter symporters. Other candi- date families include voltage-gated ion channels, G-protein coupled receptors, connexins and a wide variety of trans- porters. ACKNOWLEDGEMENTS This work was supported by a Senior Research Fellowship from the Australian Research Council (J. D. C.) and an Australian Postgradu- ate Award (R. E. M.). REFERENCES 1. Himmelreich, R., Hilbert, H., Plagens, H., Pirkl, E., Li, B.C. & Herrmann, R. (1996) Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae. Nucleic Acids Res. 24, 4420–4449. 2. Frishman, D. & Mewes, H.W. (1997) Protein structural classes in five complete genomes. Nat. Struct. Biol. 4, 626–628. 3. Wallin, E. & von Heijne, G. (1998) Genome-wide analysis of integral membrane proteins from eubacterial, archaean, and eukaryotic organisms. Protein Sci. 7, 1029–1038. 4. Deisenhofer, J., Remington, S.J. & Steigemann, W. (1985) Experience with various techniques for the refinement of protein structures. Methods Enzymol. 115, 303–323. 5. Kyte, J. & Doolittle, R.F. (1982) A simple method for displaying the hydropathic character of a protein. J. Mol. Biol. 157, 105–132. 6. Engelman, D.M., Steitz, T.A. & Goldman, A. (1986) Identifying nonpolar transbilayer helices in amino acid sequences of membrane proteins. Annu. Rev. Biophys. Biophys. Chem. 15, 321–353. 7. Jones, D.T., Taylor, W.R. & Thornton, J.M. (1994) A model recognition approach to the prediction of all-helical membrane protein structure and topology. Biochemistry 33, 3038–3049. 8. Rost, B., Casadio, R., Fariselli, P. & Sander, C. (1995) Trans- membrane helices predicted at 95% accuracy. Protein Sci. 4, 521–533. 9. Cserzo, M., Wallin, E., Simon, I., von Heijne, G. & Elofsson, A. (1997) Prediction of transmembrane alpha-helices in prokaryotic membrane proteins: the dense alignment surface method. Protein Eng. 10, 673–676. 10. Sonnhammer, E.L., von Heijne, G. & Krogh, A. (1998) A hidden Markov model for predicting transmembrane helices in protein sequences. Proc. Int. Conf. Intell. Syst. Mol. Biol. 3 6, 175–182. 11. Tusnady, G.E. & Simon, I. (1998) Principles governing amino acid composition of integral membrane proteins: application to topology prediction. J. Mol. Biol. 283, 489–506. 12. Krogh, A., Larsson, B., von Heijne, G. & Sonnhammer, E.L. (2001) Predicting transmembrane protein topology with a hidden Markov model: application to complete genomes. J. Mol. Biol. 305, 567–580. 13. Moller, S., Croning, M.D. & Apweiler, R. (2001) Evaluation of methods for the prediction of membrane spanning regions. Bioinformatics 17, 646–653. 14. Hille, B. (1992) Ionic Channels of Excitable Membranes,2ndedn. Sinauer Associates, Sunderland, MA. 15. Le Novere, N. & Changeux, J.P. (2001) LGICdb: the ligand-gated ion channel database. Nucleic Acids Res. 29, 294–295. 16. Landesman, Y., White, T.W., Starich, T.A., Shaw, J.E., Goodenough,D.A.&Paul,D.L.(1999)Innexin-3forms connexin-like intercellular channels. J. Cell Sci. 112, 2391–2396. 17. Unger, V.M., Kumar, N.M., Gilula, N.B. & Yeager, M. (1999) Three-dimensional structure of a recombinant gap junction membrane channel. Science 283, 1176–1180. 18. Bennett, M.V., Barrio, L.C., Bargiello, T.A., Spray, D.C., Hertzberg, E. & Saez, J.C. (1991) Gap junctions: new tools, new answers, new questions. Neuron 6, 305–320. 19. Ganfornina, M.D., Sanchez, D., Herrera, M. & Bastiani, M.J. (1999) Developmental expression and molecular characterization of two gap junction channel proteins expressed during embry- ogenesis in the grasshopper Schistocerca americana. Dev. Genet. 24, 137–150. 20. Lolkema, J.S. & Slotboom, D.J. (1998) Estimation of structural similarity of membrane proteins by hydropathy profile alignment. Mol. Membr. Biol. 15, 33–42. 21. Lolkema, J.S. & Slotboom, D.J. (1998) Hydropathy profile alignment: a tool to search for structural homologues of mem- brane proteins. FEMS Microbiol. Rev. 22, 305–322. 22. Wood, M.W., VanDongen, H.M. & VanDongen, A.M. (1995) Structural conservation of ion conduction pathways in K channels and glutamate receptors. Proc. Natl. Acad. Sci. USA 92, 4882– 4886. Ó FEBS 2002 Hydropathy profile search (Eur. J. Biochem. 269) 2107 . Identification of novel membrane proteins by searching for patterns in hydropathy profiles John D. Clements and Rowena E. Martin School of Biochemistry. one or a few profiles to be generated at a time. The resulting hydropathy profiles are typically examined by eye for significant features. Efforts have been

Ngày đăng: 17/03/2014, 23:20

Từ khóa liên quan

Tài liệu cùng người dùng

Tài liệu liên quan