Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 18 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
18
Dung lượng
468,79 KB
Nội dung
REVIEW ARTICLE
The bile
⁄
arsenite
⁄
riboflavin transporter(BART) superfamily
Nahla M. Mansour*, Mrinalini Sawhney, Dorjee G. Tamang, Christian Vogl and Milton H. Saier Jr
Division of Biological Sciences, University of California at San Diego, La Jolla, CA, USA
Over the years, our research group has developed a
classification system universally applicable to all trans-
membrane transporters found in living organisms on
Earth [1,2]. This system, adopted by the International
Union of Biochemistry and Molecular Biology
(IUBMB) in 2003, currently includes about 400 famil-
ies of transporters of various types [3]. As a result of
the development of sensitive software (gap [4], ic [5]),
some of these families have been shown to be distantly
related by common descent, and hence they comprise
superfamilies [6].
The importance of family and superfamily assign-
ment is emphasized by the fact that structural, func-
tional and mechanistic data for transporters can be
extrapolated from one protein to another if, and only
if, they have been shown to be related by common des-
cent [1,2]. Further, the degree to which one can extra-
polate data from one protein to another is inversely
Keywords
arsenite; bile acids; cyclic di-GMP
metabolism; intragenic duplication;
phylogeny; regulation; riboflavin; secondary
carriers; topology; transporter
Correspondence
M. H. Saier Jr, Division of Biological
Sciences, University of California at San
Diego, La Jolla, CA 92093-0116, USA
Fax: +1 858 534 7108
Tel: +1 858 534 4084
E-mail: msaier@ucsd.edu
*Present address
Vaccines & Recombinant DNA Technology
Lab, Nobel Project, NRC, Egypt
(Received 21 September 2006, revised
16 November 2006, accepted 4 December
2006)
doi:10.1111/j.1742-4658.2006.05627.x
Secondary transmembrane transport carriers fall into families and super-
families allowing prediction of structure and function. Here we describe
hundreds of sequenced homologues that belong to six families within a
novel superfamily, the bile ⁄ arsenite ⁄ riboflavin transporter(BART) super-
family, of transport systems and putative signalling proteins. Functional
data for members of three of these families are available, and they trans-
port bile salts and other organic anions, the bile acid:Na
+
symporter
(BASS) family, inorganic anions such as arsenite and antimonite, the arse-
nical resistance-3 (Acr3) family, and the riboflavin transporter (RFT) fam-
ily. The first two of these families, as well as one more family with no
functionally characterized members, exhibit a probable 10 transmembrane
spanner (TMS) topology that arose from a tandemly duplicated 5 TMS
unit. Members of the RFT family have a 5 TMS topology, and are homol-
ogous to each of the repeat units in the 10 TMS proteins. The other two
families [sensor histidine kinase (SHK) and kinase ⁄ phosphatase ⁄ synthe-
tase ⁄ hydrolase (KPSH)] have a single 5 TMS unit preceded by an N-ter-
minal TMS and followed by a hydrophilic sensor histidine kinase domain
(the SHK family) or catalytic domains resembling sensor kinase, phospha-
tase, cyclic di-GMP synthetase and cyclic di-GMP hydrolase catalytic
domains, as well as various noncatalytic domains (the KPSH family).
Because functional data are not available for members of the SHK and
KPSH families, it is not known if thetransporter domains retain transport
activity or have evolved exclusive functions in molecular reception and sig-
nal transmission. This report presents characteristics of a unique protein
superfamily and provides guides for future studies concerning structural,
functional and mechanistic properties of its constituent members.
Abbreviations
aas, amino acyl residues; Acr3, arsenical resistance-3; BASS, bile acid:Na
+
symporter; BART, bile ⁄ arsenite ⁄ riboflavin transporter; HATPase-c,
histidine kinase-like ATPase; KPSH, kinase ⁄ phosphatase ⁄ synthetase ⁄ hydrolase; RFT, riboflavin transporter; SD, standard deviations;
SHK, sensor histidine kinase; TCDB, Transporter Classification Database; TMS, transmembrane spanner; UNK, unknown.
612 FEBS Journal 274 (2007) 612–629 ª 2007 The Authors Journal compilation ª 2007 FEBS
related to their phylogenetic distances [7–11]. Import-
antly, bioinformatic procedures can reveal the evolu-
tionary pathways taken for the appearance of the
proteins [12,13].
The criterion we have been using for the establish-
ment of homology is a comparison score of nine stand-
ard deviations (SD) or greater using the gap and ic
programs. These programs correct for unusual or
restricted amino acyl residue compositions as occur in
integral membrane proteins. The value of nine SD cor-
responds to a probability of 10
)19
that the degree of
sequence similarity observed could have occurred by
chance [1,14]. This is a highly reliable criterion of
homology that is far more rigorous than most other
criteria currently in use by the scientific community.
One recently identified superfamily was shown to
include the bile acid:Na
+
symporter (BASS) family
(TC #2.A.28) and the arsenical resistance-3 (Acr3)
family (TC #2.A.59) [6]. However, except for a brief,
outdated description of the BASS family in 1999 [15]
and the establishment of a common origin for these
two families [6], the characterization of this small
superfamily had not been reported previously.
We have conducted sequence comparisons which
revealed that the recently characterized riboflavin
transporter (RFT) of Lactococcus lactis, RibU [16] is a
member of a moderately sized family of five putative
a-helical transmembrane spanning (TMS) proteins that
is distantly related to the 10 TMS transporters of the
BASS and Acr3 families. This first became apparent
following BLAST searches of theTransporter Classifi-
cation Database (TCDB). In this paper, we show that
the putative 10 TMS proteins of the latter two families
arose by intragenic duplication of an element encoding
a 5 TMS protein similar to RibU and its orthologue
in Bacillus subtilis , YpaA. We further identify addi-
tional families within this ubiquitous superfamily,
demonstrating the presence of six families, three with
a single 5 TMS repeat unit and three with two, dupli-
cated, 5 TMS repeat units. Most unexpectedly, two of
the three families with a single 5 TMS unit exhibit fea-
tures of catalytic proteins. One of these families, the
SHK family, is a coherent group of proteins of similar
structure with the N-terminal hydrophobic transporter
domain linked to a C-terminal hydrophilic sensor his-
tidine kinase (SHK) domain. The other, the KPSH
family, is a heterogeneous group of multidomain pro-
teins, each exhibiting a different set of domain combi-
nations, suggesting differing catalytic and regulatory
functions. Catalytic domains in these proteins include
kinases, phosphatases, cyclic di-GMP synthetases and
cyclic di-GMP hydrolases (KPSH). None of the four
members of the KPSH family have been functionally
characterized, but the sequence similarity with charac-
terized proteins and protein domains allows us to
make functional predictions with a high degree of con-
fidence. The SHK and KPSH families have been
briefly described previously [17]; they are listed in the
Pfam and Interpro databases as the ‘5 TM receptors
of the 5TMR-LYT domain’ (PF07694) and the ‘5 TM
receptors of the LytS-YhcK type transmembrane
region’ (IPR011620), respectively. Finally, one of the
families (UNK or unknown family), consisting of
putative transporters with two tandemly repeated
5 TMS units, includes homologues with no function-
ally characterized members. In this case, we have no
basis for making confident predictions of substrates
transported or the energy coupling mechanism(s)
involved. The observations reported here reveal that
this superfamily is far more diverse than was previ-
ously recognized.
Results
Homologues with a basic 5 TMS unit
Using the 5 TMS riboflavin transporter (YpaA; Bsu1
in Table S1) of B. subtilis as the query sequence, PSI-
BLAST searches against the NCBI protein database
with iterations [18] brought up the homologues listed
in Tables S1–S3 on our website (http://biology.ucsd.
edu/msaier/supmat/BART/). These include the char-
acterized riboflavin transporter, RibU of L. lactis
(Lla3 in supplementary Table S1) [16]. The what pro-
gram [19] was used to predict the topologies of individ-
ual proteins. A multiple alignment was derived using
the clustal x program [20], and the treeview pro-
gram [21] was used to draw a phylogenetic tree
(data not shown). This tree revealed that these proteins
fall into three subfamilies which we call the RFT
(58 proteins), SHK (31 proteins) and KPSH (4 proteins)
families (see above). While the RFT family includes
both bacterial and archaeal proteins, the SHK and
KPSH families included members that are derived
exclusively from bacteria. Most of these bacterial
proteins are from either proteobacteria or firmicutes
(Tables S1–S3).
Using the gap and ic programs [4,5], we established
that all of the proteins listed in Tables S1–S6 are
homologous in the regions of their transmembrane
domains (see below). The hydrophobic domains of
members of the three 5 TMS families are relatively
similar in sequence to each other, as are those of mem-
bers of the three 10 TMS families (comparison scores
of ‡ 15 SD). The 5 TMS proteins are more distantly
related to the 10 TMS proteins.
N. M. Mansour et al. The BART superfamily
FEBS Journal 274 (2007) 612–629 ª 2007 The Authors Journal compilation ª 2007 FEBS 613
Homologues with a basic 10 TMS unit
To identify the protein homologues of the 10 TMS
families of the bile ⁄ arsenite ⁄ riboflavin transporter
(BART) superfamily, the ArsB protein of B. subtilis
(P45946) was used as the query sequence in PSI-
BLAST searches conducted with 11 iterations. Hun-
dreds of homologues were retrieved. Redundancies
were removed, leaving 285 protein sequences (supple-
mentary Tables S4–S6). About 82% are from bac-
teria while 15% are from eukaryotes, and 3% are
from archaea. A phylogenetic tree was generated
(data not shown). The proteins proved to fall into
three major families BASS, Acr3 and UNK. All
three families contain proteins from bacteria, archaea
and eukaryotes, and all three families include pro-
teins from both Gram-positive and Gram-negative
bacteria. However, there are some organismal distinc-
tions. For example, within the eukaryotic domain,
the BASS family has homologues from plants, ani-
mals and fungi, but the Acr3 family has only fungal
protein members, and the UNK family consists only
of animal and plant proteins. These distinctions
undoubtedly correlate with distinctive functions. The
fact that eukaryotes have 10 TMS members of the
BART superfamily, but not 5 TMS members, may
reflect the tendency of eukaryotic proteins to become
larger during evolution, possibly for purposes of
complex formation, subcellular targeting and regula-
tion [22].
In the archaeal domain, only one archaeal subdivi-
sion, the Euryarchaeota, is represented. However, the
genuses represented differ depending on the family.
The BASS family has homologues only from Pyrococ-
cus, the Acr3 family has proteins from Archaeoglobus,
Pyrococcus and Thermococcus, and the UNK family
has homologues only from Haloarcula and Methano-
sarcina. The low representation of archaeal homo-
logues is worthy of note.
The BASS and UNK families have equal numbers of
eukaryotic homologues (23 and 24, respectively), about
23% and 16%, respectively, of the total numbers of
members of these two families. The Acr3 family has
just 5% of its members derived from eukaryotes.
Many organisms encode within their genomes more
than one paralogue of the 10 TMS BART superfamily
proteins, but few if any seem to encode more than
four. No archaeon has more than one. Among the
eukaryotes, the fungi appear to have just one or
two per organism while most fully sequenced genomes
of plants and animals encode either two or three.
Bacteria represented have one to four 10 TMS para-
logues.
Preliminary evidence for homology of the 5 and
10 TMS proteins of the BART superfamily
When TCDB was blasted using TC-BLAST [18,25]
with the YpaA protein of B. subtilis (Bsu1 in cluster 3
of Fig. 1B) as the query sequence, the ArsB protein of
B. subtilis was retrieved with an e-value of 0.006. Resi-
dues 22–187 in YpaA (TMSs 2–5) aligned with resi-
dues 25–167 in ArsB (TMSs 2–5) showing 26%
identity and 42% similarity. When the best conserved
region of this binary alignment was examined with the
gap program and 500 random shuffles, a comparison
score of 7.0 SD was obtained. These values are already
sufficient to suggest, but not establish, homology.
The sequence similarity between the 5 TMS proteins
and the first repeat unit of the 10 TMS proteins was
substantially greater than observed when the 5 TMS
protein sequences were compared with the second
repeat units of the 10 TMS proteins. This observation
led us to suggest that when the 5 TMS proteins (which
presumably function as homodimers) duplicated to
give 10 TMS proteins, the first repeat unit retained its
original topology and its primary, generalized, trans-
port function, while the second repeat unit diverged in
sequence to a greater extent to assume the opposite
topology in the membrane and to serve a more special-
ized, permease-specific function. A generalized function
might, for example, be energy coupling, while a
specialized function might be substrate recognition.
Precedence for these concepts has been published pre-
viously [26–30]. Homology between the 5 TMS and
10 TMS proteins is established below.
The riboflavin transporter (RFT) family
The proteins of the RFT family within the BART
superfamily are presented in Table S1, and the
multiple alignments of their sequences are shown in
Figure S1A on our website. In Table S1 and sub-
sequent tables, the proteins are arranged first accord-
ing to phylogenetic cluster, and second according to
position in that cluster. Using the what program [19]
and the hmmtop program [31], most homologues have
five putative TMSs although some were predicted to
have six (Table S1).
The average hydropathy and similarity plots for this
family are shown in Fig. 1A. There are five peaks of
average hydrophobicity corresponding to five peaks of
average similarity. It therefore appears that these pro-
teins share a common 5 TMS topology. The amphi-
pathicity plot (not shown) revealed no distinctive
characteristics. The multiple alignments upon which
these plots were based (Fig. S1A) showed no single
The BART superfamily N. M. Mansour et al.
614 FEBS Journal 274 (2007) 612–629 ª 2007 The Authors Journal compilation ª 2007 FEBS
residue position with full residue conservation or even
full conservation of residue type. However, as shown
in Fig. 1A, TMS 2 is best conserved. It shows the fol-
lowing consensus sequence:
D FSDVPðHyÞ
3
G G ðHyÞ
3
GPðHyÞ
2
G ðHyÞ
6
KNðHyÞ
3
Y ðHyÞ
2
XGX
3
G
(alignment positions 76–114; X, any residue; Hy, any
hydrophobic residue; italic residues, consensus residues
that are common to those in the SHK family; under-
lined residues, consensus residues that are common to
those in the KPSH family.)
The clustal x-derived phylogenetic tree of the RFT
family is shown in Fig. 1B, and the bootstrapped tree is
shown in Figure S1B. We also derived paup-based trees
using both neighbor joining and parsimony algorithms
(Figs S1C and S1D, respectively). Neighbor joining
bootstrapped trees for all six families (supplementary
Figs S1–6B and S1–6C) as well as parsimony trees (sup-
plementary Figs S1–6D) are provided on our website.
The neighbor joining and parsimony trees, with or
Va lu e
Alignment Position
-0.5
0
0.5
1
1 100 200
A
B
12
3
4
5
-1
Tma1
Tko1
Pab1
Pho1
Cac3
Cac2
Cte2
Sth3
Rxy1
Bac1
Tte3
Lca2
Lde2
Lac2
Lga2Ljo2
Lpl2
Ppe2
Efa4
Efa5
Lla4
Sth7
Spy3
Ssu2
Spn4
Sag4
Smu3
Lla3
Ssu1
Smu1
Spy2
Sag1
Spn2
Sth2
Lme1
Ooe2
Lmo3
Lin1
Gka1
Bsu1
Bli1
Bce1
Ban1
Bth1
Oih1
Sau2
Sep1
Cpe1
Tte1
Cac1
Cte1
Blo2
Lpl1
Ppe1
Efa1
Lca1
Lga1
Ljo1
1
2
3
4
Fig. 1. (A) Average hydropathy (top) and
similarity (bottom) plots for the RFT family.
The
AVEHAS program [79] was used to gener-
ate the plots shown here (and elsewhere in
this paper), based on the
CLUSTAL X [20] mul-
tiple alignment as shown in Fig. S1. The
proteins and their properties are tabulated in
Table S1A on our website (http://biology.
ucsd.edu/msaier/supmat/BART). (B) Phylo-
genetic tree of the RFT family. The tree is
based on the
CLUSTAL X alignment shown in
Fig. S1A. The bootstrapped tree is shown in
Fig. S1B.
PAUP-based trees [76] based on
neighbor joining (Fig. S1C) and parsimony
(Fig. S1D) are also available on our website.
All tables of the proteins (Tables S1–S6),
multiple alignments of the protein members
of the six families of the BART superfamily
(RFT, SHK, KPSH, BASS, ACR and UNK;
Figs S1A–S6A) as well as the bootstrapped
trees (Figs S1B–S6B) can be found on our
website (http://biology.ucsd.edu/msaier/
supmat/BART).
PAUP trees designed using
neighbor joining (with bootstrapping) and
parsimony (without bootstrapping) can be
found on our website in Figs S1C–S6C, and
S1D–S6D, respectively. The format of pres-
entation is the same for Figs 2–6.
N. M. Mansour et al. The BART superfamily
FEBS Journal 274 (2007) 612–629 ª 2007 The Authors Journal compilation ª 2007 FEBS 615
without bootstrapping, are very similar. Four clusters
are apparent. Cluster 1 consists of proteins from firmi-
cutes and one Actinobacterium, Rubrobacter xylanophi-
lus (Rxy1). This last protein falls into a subcluster with
three firmicute proteins, showing that although most of
these proteins follow the phylogenies of the host organ-
isms, this is not true of Rxy1. Proteins in cluster 1 show
a broader size range [176–222 amino acyl residues (aas)]
than for the other three clusters, but proteins in all four
clusters are of similar sizes.
Cluster 2 proteins are exclusively from firmicutes,
and seven of the nine homologues have paralogues in
cluster 1. The other two (Lme1 and Ooe2) apparently
lack paralogues in the RFT family. All but two pro-
teins in cluster 3 are from firmicutes, and the two
exceptions (Tma1 and Blo2) are distantly related to
each other and all other homologues of cluster 2. Clus-
ter 2 contains the characterized riboflavin transporter,
RibU, from Lactococcus lactis (Lla3) [16].
Cluster 3 contains the functionally characterized ribo-
flavin transporter of B. subtilis (Bsu1; YpaA; C. Vogl,
unpublished results). Because there is extensive overlap
of organismal sources between clusters 1 and 2, as well
as between clusters 1 and 3 (but not between clusters 2
and 3), we suggest that the proteins in cluster 1 primar-
ily represent one set of functionally related orthologues,
different from those in clusters 2 and 3, which may,
however, all be orthologous, serving a single function.
Three archaeal proteins comprise cluster 4. These
proteins are also likely to be orthologous to each
other, possibly also to cluster 1 or cluster 2 ⁄ 3 proteins.
The SHK family
Thirty-one proteins comprise the current SHK family
(supplementary Table S2 and Figs S2A–D). These pro-
teins have an N-terminal 6 TMS hydrophobic domain
(Fig. 2A) where TMS 0 is unique to the SHK family.
It is, however, well conserved, suggesting that it serves
an important unified function in proteins of the SHK
family. TMSs 1–5 in Fig. 2A correspond in sequence
to TMSs 1–5 in the RFT family (Fig. 1A). Note that
in both Figs 1A and 2A, TMSs 1, 2 and 4 are more
hydrophobic than peaks 3 and 5.
The 6 TMS hydrophobic domain in SHK family
proteins is followed by three recognizable domains.
The first is a cGMP-binding phosphodiesterase ⁄
Anabaena adenyl cyclase ⁄ E. coli FhlA (GAF) domain,
present in phytochromes, cyclic GMP phosphodiest-
erases and other sensory transduction proteins [32].
The second is a large, well conserved sensor kinase
domain, homologous to thousands of other sensor kin-
ases in the NCBI database. Those included in this
study are all more similar to each other than they are
to any of the other sensor kinases, and only these have
the homologous N-terminal hydrophobic domain com-
mon to the RFT family proteins. The third domain is
the HATPase-c domain, a histidine kinase-like ATPase
domain. These domains are found not only in sensor
kinases, they are also found in topoisomerases I and II,
heat shock proteins of the HSP90 family, phytochrome
ATPases and DNA mismatch repair enzymes [33].
Because sensor kinase domains must be in the cyto-
plasm, we can infer that TMS 0 (Fig. 2A) passes
through the membrane from inside the cell to the
outside. By analogy, the 5 TMS proteins of the RFT
family may have their N-termini outside and their
C-termini inside (see below).
Examination of the SHK family multiple alignments
(Fig. S2A) revealed many fully conserved residues.
The most condensed region of conservation within the
hydrophobic domain occurred in TMS 2 where the
consensus sequence is:
N T
R ðHy Þ
2
G ðHyÞ
3
ðGÞ G ðHyÞ
2
G G P ðHyÞ
2
G ðHyÞ
3
G LTGG L HRYSHyG
(alignment positions 113–145; Hy, any hydrophobic
residue; bold, fully conserved; italic residues, common
to the consensus sequence residues for the RFT family;
underlined residues, common to those of the KPSH
family).
A few fully conserved positions are also present
in TMSs 3, 4 and 5 as well as the downstream hydro-
philic domains. The latter include an A VAI T
DREKI L A consensus region with three fully con-
served residues (alignment positions 292–303). Exam-
ination of Table S2 and Fig. 2B reveal that only one
organism, Photobacterium profundum, has more than
one SHK member, and the two paralogues in this
organism are distantly related, falling into different
clusters of the phylogenetic tree. With two exceptions,
proteins of the SHK family are of fairly uniform size
(556–597). Both firmicutes and proteobacteria as well
as one homologue from Fusobacterium nucleatum, are
represented. All members of the SHK family are pre-
dicted to have six TMSs (Table S2).
The phylogenetic tree for the SHK family (Fig. 2B)
shows four clusters. Bootstrap values are provided in
Figures S2B and S2C. Cluster 1 proteins are exclu-
sively from firmicutes, while cluster 3 and 4 proteins
are exclusively from proteobacteria. Each of these
three clusters is coherent with all proteins within any
one cluster branching from each other at points distant
from the center of the tree. The two short variants,
Ahy1 and Ppr2 (440 and 451 aas), from Aeromonas
The BART superfamily N. M. Mansour et al.
616 FEBS Journal 274 (2007) 612–629 ª 2007 The Authors Journal compilation ª 2007 FEBS
hydrophila and Photobacterium profundum, respectively,
comprise cluster 3. Cluster 2 is most diverse in
sequence as well as organismal source. These loosely
clustered proteins are from proteobacteria, firmicutes
and Fusobacterium nucleatum. The presence of this
cluster clearly suggests that members of the SHK fam-
ily are not all orthologous.
The KPSH family
Four proteins comprise the KPSH family (Table S3).
These proteins are about equally diverse in sequence as
revealed by the phylogenetic tree shown in Fig. 3B.
Each is from a different bacterial subdivision, one
from Deinococcus geothermalis, one from a c-proteo-
bacterium, one from a d-proteobacterium, and one
from a putative uncultured archaeon. The tree is based
on the multiple alignment shown in Fig. S3A. These
proteins exhibit six N-terminal peaks of hydrophobici-
ty (peaks 0–5 in Fig. 3A), corresponding to TMSs 0–5
in Fig. 2A. TMSs 1–5 correspond to TMSs 1–5 in
Fig. 1A. As with TMS 2 of the RFT and SHK famil-
ies, TMS 2 of the KPSH family is the best conserved
with the following consensus sequence:
G ðHyÞ
3
D Hy R X ðHyÞ
5
X G LFXG XLPðHyÞ
10
YRLXHyG G
(alignment positions 72–110 in Fig. S3A; X, any resi-
due; Hy, any hydrophobic residue; bold, fully con-
served; italic residues are common to those in the RFT
family consensus sequence; underlined residues are
-1.5
-1
-0.5
0
0.5
1
1 100 200 300 400 500 600
A
B
0
1
2
3
4
5
Ban2
Bsu2
Bli2
Ppe3
Efa3
Ooe1
Sag3
Smu2Dde1
Dvu1
Bcl1
Tte2
Fnu1
Ppr2
Ahy1
Dac1
Son1
Ppr1
Vfi1
Vch1
Vvu1
Vpa1
Eam1
Sen1
Sty1
Sfl1
Eco1
Rru1
Eca1
Cvi1
Dar1
12
3
4
Fig. 2. (A) Average hydropathy (top) and
similarity (bottom) plots for the SHK family.
(B) Phylogenetic tree for the SHK family.
The multiple alignment and list of proteins
used are presented in Fig. S2A and
Table S2, respectively. Four homologues of
abnormal size, listed in Table S2, were eli-
minated when the Fig. S2A alignment was
derived. The bootstrapped trees are shown
in Figs S2B and S2C. The parsimony tree is
shown in Fig. S2D.
N. M. Mansour et al. The BART superfamily
FEBS Journal 274 (2007) 612–629 ª 2007 The Authors Journal compilation ª 2007 FEBS 617
common to residues in the SHK family consensus
sequence.)
The four proteins that comprise the sequence diver-
gent members of the KPSH family are all multidomain
proteins that seem to share only the characteristic of
having a common N-terminal hydrophobic domain. In
the case of Son2 from Shewanella oneidensis (998 resi-
dues), following the N-terminal 6 TMS domain are
three PAS helix–loop–helix, protein–protein interaction
domains, common in proteins involved in energy sens-
ing and signal transduction [34,35], a GGDEF domain
(domain containing the conserved GDEF motif) and
an EAL domain (domain containing the conserved
EAL motif). The latter two domains are likely to be
involved in cyclic di-GMP synthesis and hydrolysis,
respectively [36–38].
The Uar2 protein, from an ‘uncultured archaeon’ is
of 654 aas and has (following the common N-terminal
hydrophobic domain) a LytS domain followed by a
COG4191 domain (of unknown function), a histidine
kinase A dimerization phosphoacceptor (HisKA)
domain, and a C-terminal HATPase-c domain. The
LytS domain is homologous to LytS, a signal transduc-
tion regulator of cell autolysis [17]. The HisKA domain
is a conserved bacterial histidine sensor kinase domain
[39], and the HATPase-c domain resembles a histidine
kinase ATPase domain [40]. Uar2 is similar in several
of these respects to members of the SHK family.
Following the N-terminal 6 TMS domain of the
Gsu1 protein from Geobacter sulfurreducens are at least
two AtoS-type sensor kinase domains [41], followed by
(a) a HisKA domain, (b) a HATPase-c domain, and (c)
a signal receiver (REC) domain at the extreme C-termi-
nus of this 1112 residue protein. Finally, the Dge1 pro-
tein from Deinococcus geothermalis is relatively short
(349 residues) with a single GGDEF domain following
the hydrophobic transmembrane domain.
The BASS family
Functionally characterized members of the BASS fam-
ily catalyze Na
+
:bile acid symport [15,42]. These sym-
porters exhibit broad specificity, taking up a variety of
nonbile organic compounds as well as taurocholate
and other bile salts [43]. They have been identified in
intestinal, liver and kidney tissues of animals, and at
least three isoforms are present in a single species such
as humans. The BASS family is also called the solute
carrier family 10 [23,24,43]. Functionally characterized
members of the BASS family appear to possess their
bile acid binding sites within and preceding the last
transmembrane spanner [23,44].
A BASS in the apical membrane of the human ileal
intestine catalyzes the electrogenic uptake of bile acids
with a stoichiometry of bile acid:Na
+
of 1 : 2 [24]. This
protein is associated with the 16 kDa subunit c of the
vacuolar proton pump, an association that may in part
account for its apical location [45]. Thus, the vacuolar
proton pump-associated apical sorting machinery may
play a role in sorting the apical Na
+
:bile symporter to
the basolateral membrane.
The rat liver Na
+
⁄ taurocholate cotransporter is sub-
ject to elaborate regulation in response to cyclic AMP
and cell swelling [46,47]. It has two N-terminal,
B
-1.5
-0.5
0.5
Value
Alignment Position
-1
0
1
1 100 200 300 400 500 600 700 800 900 1000 1100
A
0
1
2
3
4
5
Gsu1
Uar2
Dge1
Son2
Fig. 3. (A) Average hydropathy (top) and similarity (bottom) plots for
the four proteins of the KPSH family. (B) The phylogenetic tree for
these four proteins. The multiple alignment (Fig. S3) and list of pro-
teins (Table S3) are available on our website. The bootstrapped
trees are shown in Figs S3B and S3C. The parsimony tree is shown
in Fig. S3D.
The BART superfamily N. M. Mansour et al.
618 FEBS Journal 274 (2007) 612–629 ª 2007 The Authors Journal compilation ª 2007 FEBS
N-linked carbohydrate sites and two Tyr-based basolat-
eral sorting motifs at its carboxyl terminus (YEKI and
YKAA). The former targets the protein to the apical
membrane in the absence of the latter, but the latter
overrides the former, targeting the protein to the baso-
lateral membrane [48]. The ileal homologue has a
14-residue cytoplasmic tail with a b-turn structure that
targets the protein to the apical membrane [49].
The human orthologue of the rat Na
+
⁄ taurocholate
symporter (TC #2.A.28.1.1) (NTCP; SLC10A1) exhib-
its multiple single nucleotide polymorphisms in popula-
tions of European, African, Chinese and Hispanic
people [44]. Four nonsynonymous single nucleotide
polymorphisms are associated with significant loss of
transport function or change in substrate specificity.
One form, found in Chinese Americans, does not cata-
lyze bile acid uptake but catalyzes estrone sulfate
uptake. This transporter may play a role in mainten-
ance of enterohepatic recirculation of bile acids [44].
The members of the BASS family can be found on
our website (Table S4), and the clustal x alignment
of their sequences, shown in Fig. S4A, provides the
basis for the average hydropathy and similarity (Ave-
HAS) plots shown in Fig. 4A as well as the tree shown
in Fig. 4B and the bootstrapped trees shown in
Figs S4B and S4C. As revealed in Table S4, most
organisms represented have only one member of the
BASS family, but two can be found in a few bacteria,
plants and animals, and animals can have up to three.
Only Bos taurus and Tetraodon nigroviridis have three.
Most of the homologues from prokaryotes fall into the
size range 300–350 aas although a few are smaller or
larger. The plant proteins are about 400 aas in length,
and the animal homologues range from about 350–550
aas with one protein from the chicken having 679 aas.
The average hydropathy and similarity plots reveal
10 conserved peaks of average hydropathy. Striking
peaks of amphipathicity were observed just preceding
peak 1, between peaks 2 and 3, and between peaks 4
and 5, although striking peaks of average amphipathic-
ity were not observed in the second hydrophobic
halves of these proteins (data not shown). Only the
chicken homologue, Gga1, has an extension following
peak 10, and only Gga4 has an internal deletion not
found in the other homologues. These could be due
to errors in exon recognition. The Tni1 protein from
Tetraodon nigroviridis has several internal hydrophilic
insertions. Several proteins have N-terminal hydrophi-
lic extensions, but Gga4 has the longest. A single resi-
due proved to be fully conserved in all members of the
BASS family. This is a prolyl residue at alignment
position 451 in TMS 5. The best conserved regions
overlap the moderately hydrophobic peaks 4 (best
conserved) and 9 (less well conserved). The consensus
sequences for these two peaks are:
TMS 4: Hy A V G ðHyÞ
4
GCCPGGTASN
ðHyÞ
2
ðSTÞ FLALGDV
TMS 9: R ðSTÞ Hy ðSTGÞ FHyGHyQNðSTGÞ
GLðAGCÞðHyÞ
4
(Hy, any hydrophobic residue; residues in parentheses
represent alternative possibilities at a single position.)
The BASS family trees are shown in Figs 4B, and
Supplementary figures S4B, S4C and S4D, all of which
show excellent agreement as usual. The trees show
eight primary clusters as well as several branches that
stem from the center of the tree and therefore do not
belong to one of the primary clusters. Each of these
branches bears a bacterial protein. This tree reveals
that BASS family members cluster primarily according
to organismal type (also Table S4). Thus, clusters 1–2
consist only of prokaryotic proteins, including both
bacterial and archaeal proteins; the small cluster 3 pro-
teins are derived only from proteobacteria; cluster 4
proteins are from plants and cyanobacteria; cluster 5
proteins are from a range of nonproteobacterial types;
cluster 6 and 7 proteins derive exclusively from ani-
mals; and cluster 8 is derived only from bacteria.
Although bacterial paralogues were not observed in
clusters 1–4, the proteins in none of these clusters fol-
lowed the phylogenies of the host organisms. Perhaps
early extragenic duplication events followed by nonse-
lective gene loss or horizontal transfer of the encoding
genes account for these results. Only rice, with two
paralogues in cluster 4 (Osa2 and Osa3) has more than
one homologue in any one of these clusters.
In contrast to clusters 1–4, the clustering patterns in
cluster 5 follow those of the source organisms. Because
each protein is derived from a different organism, these
proteins may be orthologues serving a single function.
Like clusters 1–4, the animal proteins in clusters 6 and
7 and the bacterial proteins in cluster 8 are not likely
to be orthologous although subclusters of potential
orthologues can be identified. For example, the cluster-
ing of a spirochete protein (Lin5) with the cyanobacte-
rial homologues is unexpected, and possibly resulted
from horizontal gene transfer between subdivisions.
The ACR3 family
Two proteins of the Acr3 family have been function-
ally characterized. These proteins are the ‘Acr3’ pro-
tein of Saccharomyces cerevisiae, also called the Arr3
protein [50], and the ‘ArsB’ protein of Bacillus subtilis
[51]. The latter protein is not related to ArsB of
N. M. Mansour et al. The BART superfamily
FEBS Journal 274 (2007) 612–629 ª 2007 The Authors Journal compilation ª 2007 FEBS 619
Escherichia coli. The Acr3 protein is present in the
yeast plasma membrane and pumps arsenite, but not
arsenate, antimonite, tellurite, cadmium or phenyl-
arsine oxide out of the cell in response to the proton
motive force [50]. The Bacillus protein exports both
arsenite and antimonite [51]. The exact transport
mechanism is not established, but a uniport or cation
antiport mechanism seems probable.
Table S5 and Fig. S5A on our website present the
members of the Acr3 family and show the clustal x
multiple alignment, respectively, upon which the aver-
age hydropathy (Fig. 5A, top) and average similarity
(Fig. 5A, bottom) plots as well as the phylogenetic tree
(Fig. 5B) are based. The bootstrapped and parsimony
trees are shown in Figs S5B, S5C and S5D on our
website. Examination of Table S5 reveals that most
organisms represented have only one Acr3 homologue,
and those with two are all from bacteria. No archaeon
or eukaryote displays more than one, and no organ-
isms had more than two.
Examination of the size variations observed for these
proteins revealed that most of the prokaryotic
Alignment Position
Value
-0.5
-1.5
0
0.5
1
1 100 200 300 400 600 700 800 900
1
2
3
4
5
6
7
8
9
10
1500
-1
1
A
B
Aae1
Kra2
Asp4
Bli4
Son2
Wsu1
Dac2
Csa1
Mma7
Mac2
Gox2
Bfu3
Sen4
Sty2
Cef2
Cdi1
Mma5
Asp3
Bpa1
Sco2
Pae3
Psy
Pfl1
Sau3
Msu2
Nme3
Oih1
Bli3
Bsu2
Bcl1Bha1
Ban2
Bth5
Lme1
Sth1
Smu1
Nme4
Msu1
Hso1
Osa2
Les1
Ath1
Osa3
Mca2
Sel4
Ssp6
Ftu1
Spo1
Ava3
Sav2
Bli5
Gka3
Bha2
Bcl3
Pgi1
Bfr1
Bth6
Dme3
Cbr1
Jsp1
Mmu7
Bta1
Gga4
Tni1
Bta2
Gga1
Ocu1
Cfa1
Ptr1
Ppy1
Rno1
Cgr1
Dre1
Tni2
Bta3
Tni4
Ppr2
Mde2
Mma9
Lin5
Sel6
Ssp7
Cwa2
Ava2
Nsp2
Pae6
Ppu3
Rge3
Bab1
Bme1
6
7
1
2
3
4
5
8
Hsa5
2
Fig. 4. (A) Average hydropathy (top) and
similarity (bottom) plots for the BASS family.
(B) The phylogenetic tree for the BASS fam-
ily proteins. The list of proteins and the mul-
tiple alignment upon which these plots
were based can be found in Table S4 and
Fig. S4A on our website, respectively. The
bootstrapped trees are shown in Figs S4B
and S4C. The parsimony tree is shown in
Fig. S4D.
The BART superfamily N. M. Mansour et al.
620 FEBS Journal 274 (2007) 612–629 ª 2007 The Authors Journal compilation ª 2007 FEBS
homologues are of similar sizes (320–390 aas) with just
a few exceptions. All of the fungal proteins are larger
(389–454 aas), and the two Mycobacterial orthologues
are still larger (498 aas). The latter two proteins have
hydrophilic C-terminal extensions of about 140 resi-
dues. These extensions correspond to the entirety of low
molecular weight phosphatases of the LMWP family,
some of which (e.g., Wzb of E. coli; P0AAB2; 147 aas)
hydrolyze phosphotyrosine proteins, regulating capsular
exopolysaccharide production [52–54]. Possibly these
transporters play a role in polysaccharide secretion.
The fungal homologues proved to have either a 50
residue hydrophilic insertion between putative TMSs 8
and 9, or an N-terminal hydrophilic extension in front
of TMS 1, both of unknown function. The average
hydropathy and similarity plots reveal 10 well con-
served peaks of hydrophobicity (1–10) as well as an
additional C-terminal peak (11) present in several
homologues, but not in many others. Two prolyl resi-
dues are fully conserved, one at alignment position 185
in TMS 3 and the other at alignment position 337
in TMS 6. Nevertheless, the best conserved peaks over-
all were TMSs 4 and 9 as for the BASS family.
The consensus sequences for these two TMSs are:
TMS 4: G A A P C T A A ðHyÞ
3
WSXHyðASTÞ XG
ðDETÞ PXðFYÞðTACÞ
TMS 9: A A P ðSAÞ
2
ðHyÞ
2
GASNFFEHyAHyA
Hy A Hy ðSAGÞ Hy F G
(Hy, any hydrophobic residue; residues in parentheses
represent alternative possibilities at a single position.)
Phylogenetic trees for the Acr3 family are shown in
Figs 5B, S5B, S5C and S5D, all in good qualitative
agreement. Of the eight bacteria having two para-
logues, all but one (Dechloromonas aromatica) have
one of these paralogues in cluster 1 and the other in
cluster 3. D. aromatica has one in cluster 2 and one
in cluster 3. It is interesting to note that bacterial and
archaeal proteins are found in all three clusters, but
the eukaryotic proteins are all in cluster 3. These fun-
gal proteins cluster together, distant from any of the
bacterial proteins which cluster into two distinct sub-
clusters of cluster 3. The functionally characterized
arsenite exporters, Sce1 of Saccharomyces cerevisiae,
and Bsu1 of B. subtilis, are in the fungal and pro-
karyotic subclusters of cluster 3, respectively (see
below).
Cluster 1 is diffuse, consisting of distantly related
proteins. Subclusters correspond to specific types of
bacteria (firmicutes or proteobacteria). The same is
observed for some of the subclusters in the more com-
pact cluster 2, but there are also some notable
exceptions [e.g., Cth1 (from a firmicute) clusters with
proteobacterial proteins, and Rpa1 (from a Plancto-
mycetes) clusters with Msp1 from an a-proteobacteri-
um]. The two primary subclusters in cluster 3 include
proteins exclusively from fungi and exclusively from
bacteria and archaea, respectively. The latter subclus-
ter is split into two subsubclusters, one derived from
Actinobacteria with one exception (Mma1 from Mag-
netospirillum magnetotacticum,ana-proteobacterium),
the other derived from various other prokaryotic sub-
divisions, but not from Actinobacteria. This last one
includes proteins from proteobacteria, firmicutes,
cyanobacteria, chlorobi and euryarchaeota.
The UNK family
The members of the UNK family are listed in
Table S6, and the multiple sequence alignment is
shown in Fig. S6A. The latter provided the basis for
the average hydropathy and similarity plots shown in
Fig. 6A and the tree presented in Fig. 6B. The UNK
proteins are derived from eukaryotes (animals, plants
and fungi) and bacteria (proteobacteria and actinobac-
teria primarily). No two UNK family proteins are
derived from a single organism.
The average hydropathy plot reveals 10 conserved
peaks of hydropathy. A single strong peak of amphi-
pathicity (angle set at 100°) was observed between
putative TMSs 6 and 7 (data not shown). As expected,
based on the properties of the previously described
families, peaks 4 and 9 were only weakly hydrophobic.
Several fully conserved residues were found: prolyl
and glycyl residues in peak 4, a P in peak 6, a K pre-
ceding peak 9, and a P and a Q in peak 10. The best
conserved peaks were 4 and 5, and 9 and 10 (Fig. 6A).
One protein, Mgr2, had an internal deletion near the
N-terminus as well as a long C-terminal extension of
about 300 residues.
Consensus sequences for the four best conserved
regions are:
P4: G ðHyÞ
4
CX LP ðSTÞTVQS SIAFTSHyAKGNV
P9: F C G S K K SLAðSTÞ GHyPMAXHyHyF
P5: S S ðHyÞ
2
G ðHyÞ
3
TPðHyÞ
3
TPðHyÞ
3
G ðHyÞ
3
P10: GðHyÞ
4
P ðHyÞ
3
FHQ IQ L MVCAðHyÞ
2
(X, any residue; Hy, any hydrophobic residue; bold,
fully conserved.)
Limited sequence similarity can be observed between
the P4 and P9 sequences, and between the P5 and P10
sequences.
N. M. Mansour et al. The BART superfamily
FEBS Journal 274 (2007) 612–629 ª 2007 The Authors Journal compilation ª 2007 FEBS 621
[...]... binding of the ligand to the outside Such a scenario has been documented in the E coli phosphate-specific ABC transporter which interacts noncovalently with a sensor kinase (PhoR) to influence its activity [66] Fusion of thetransporter domain to the sensor kinase domain suggests a close functional relationship between the two domains [58,67–69] The last family within the BART superfamily, the KPSH family,... be present in the cytoplasm, this suggests that the additional N-terminal TMSs probably have their N-termini in the cytoplasm If so, the conserved 5 TMS unit goes from out to in This would suggest that members of the RFT family, with 5 putative TMSs, may also have their N-termini outside and their C-termini inside Because the 5 TMS transporters show greatest sequence similarity with the first N-terminal... ligand through the membrane could actually be the sensed event that activates or inhibits the sensor kinase as in the case of phosphoryl transfer-dependent regulation via the E coli phosphoenolpyruvate-dependent phosphotransferase system [64,65] Third, the N-terminal domain might be both a sensor and a transporter, acting on the same ligand, but with the sensor function independent of the transport... superfamilies of transporters where transporter homologues serve as receptors, either while retaining their transport function, or while losing it [3,58– 63] In a few members of the sodium:solute symporter superfamily, full-length transporter domains are fused to sensor kinase domains [3] As for the SHK family, it is not known if thetransporter domain is active as a transporter, or if it functions exclusively... families of the BART superfamily with no functionally characterized members One (UNK) includes members that look like typical 10 TMS porters The second (SHK) proved to be a coherent family of structurally similar proteins with an N-terminal 6 TMS transporter domain with TMSs 2–6 being homologous to the 5 TMS element that characterizes all members of the BART superfamily Because the C-termini of these proteins... hydropathy (top) and similarity (bottom) plots for the Acr3 family (B) The phylogenetic tree for the Acr3 family proteins The list of proteins and the multiple alignment upon which these plots were based can be found in Table S5 and Fig S5A on our website, respectively The bootstrapped trees are shown in Figs S5B and S5C The parsimony tree is shown in Fig S5D The phylogenetic tree shown in Fig 6B reveals... units in the 10 TMS homologues, we suggest that these proteins also display their N- and C-termini outside and their central loops inside These predictions were confirmed when we conducted charge distribution studies (data not presented) The positive inside rule [55–57] has provided valid predictions for transport protein topology Its application to members of the six families of the BART superfamily. .. for the UNK family (B) Phylogenetic tree for the UNK family proteins The list of proteins and the multiple alignment upon which these plots were based can be found in Table S6 and Fig S6A on our website, respectively The bootstrapped trees are shown in Figs S6B and S6C The parsimony tree is shown in Fig S6D FEBS Journal 274 (2007) 612–629 ª 2007 The Authors Journal compilation ª 2007 FEBS 623 The BART... single subcluster, as do the b-proteobacterial proteins, the c-proteobacterial proteins of cluster 10 fall into two subclusters, one for the Pseudomonads and one for the Xanthomonads These two c-proteobacterial genuses are known to be distantly related to each other Motif similarities among all 10 TMS homologues The C-terminal regions of the consensus sequences of TMSs 4 in the three 10 TMS families... sufficient to establish homology [1] The six families described above as well as the two repeat units of the 10 TMS proteins are therefore derived from a single ancestral sequence, and consequently, they comprise a single superfamily Discussion In addition to defining the phylogenetic and structural properties of the two previously recognized families (BASS and Acr3), and the newly discovered 5 TMS transport . and YKAA). The former targets the protein to the apical membrane in the absence of the latter, but the latter overrides the former, targeting the protein to the baso- lateral membrane [48]. The ileal. to each other than they are to any of the other sensor kinases, and only these have the homologous N-terminal hydrophobic domain com- mon to the RFT family proteins. The third domain is the HATPase-c. Fusion of the transporter domain to the sensor kinase domain suggests a close functional relationship between the two domains [58,67–69]. The last family within the BART superfamily, the KPSH