TheCRIPTO/FRL-1/CRYPTIC(CFC)domainofhuman Cripto
Functional andstructuralinsightsthroughdisulfidestructure analysis
Susan F. Foley, Herman W. T. van Vlijmen, Raymond E. Boynton, Heather B. Adkins, Anne E. Cheung,
Juswinder Singh, Michele Sanicola, Carmen N. Young and Dingyi Wen
Biogen, Inc., Cambridge Center, Cambridge, MA, USA
The disulfidestructureofthe CRIPTO/FRL-1/CRYPTIC
(CFC) domainofhumanCripto protein was determined by
a combination of enzymatic and chemical fragmentation,
followed by chromatographic separation ofthe fragments,
and characterization by mass spectrometry and N-terminal
sequencing. These studies showed that Cys115 forms a
disulfide bond with Cys133, Cys128 with Cys149, and
Cys131 with Cys140. Protein database searching and
molecular modeling revealed that the pattern of disulfide
linkages in the CFC domainofCripto is the same as that in
PARS intercerebralis major Peptide C (PMP-C), a serine
protease inhibitor, and that the EGF-CFC domains of
Cripto are predicted to be structurally homologous to the
EGF-VWFC domains ofthe C-terminal extracellular
portions of Jagged 1 and Jagged 2. Biochemical studies of the
interactions of ALK4 with the CFC domainofCripto by
fluorescence-activated cell sorter analysis indicate that the
CFC domain binds to ALK4 independent ofthe EGF
domain. A molecular model ofthe CFC domainof Cripto
was constructed based on the nuclear magnetic resonance
structure of PMP-C. This model reveals a hydrophobic
patch in thedomain opposite to the presumed ALK4
binding site. This hydrophobic patch may be functionally
important for the formation of intra or intermolecular
complexes.
Keywords: Cripto; disufide structure; CFC domain; model.
Cripto is a member of a family of proteins that includes
human Criptoand Criptic, murine Criptoand Criptic, frog
FRL-1, zebrafish one-eyed pinhead protein (oep) and chick
Cripto [1–3]. The involvement of these proteins in early
embryonic development is well established [1,2,4–11], and
other recent investigations indicate that Cripto is over-
expressed in a number ofhuman cancers [2]. These proteins
are characterized by two cysteine-rich structural motifs: an
epidermal growth factor (EGF)-like domainand a CRI-
PTO/FRL-1/Cryptic (CFC) domain, the latter of which is
considered unique to this family. Previous characterization
of human recombinant Cripto (residues 1–169) showed that
mature protein begins at Leu31, that Asn79 is N-glycosyl-
ated with >90% occupancy, Ser40 and Ser161 are partially
O-glycosylated [2], and Thr88 is modified with a single
O-linked fucose [12]. Ser161 is the predicted x-site for
propeptide cleavage and glycosylphosphatidylinositol (GPI)
attachment, andthe segment comprising residues 170–188 is
the predicted signal peptide for GPI-anchorage (Fig. 1) [1].
Evidence of an EGF-like domainstructure in Cripto-related
proteins is based on amino acid sequence homology
[1,2,4,13,14], molecular modeling [1,14], and gene structure
[1,2,4,14]. The CFC region has no predictive model for
disulfide linkage of its six cysteines.
The role ofthe EGF-CFC family of proteins in
embryogenesis is still being elucidated, but information to
date suggests that Cripto is required for Nodal binding to
the ActRIIB/ALK4 receptor complex and for Nodal
activation of similar to mothers against decapentaplegic
peptide-2 (Smad-2) [15–18]. Moreover, point mutation
experiments with Cripto have shown that the EGF domain
is necessary for binding to Nodal andthe CFC domain is
responsible for binding to ALK4 [16,17,19]. A naturally
occurring Pro125 fi Leu mutation in the CFC domain of
Cripto has been correlated with developmental anomalies
in the midline and forebrain in human fetuses, and an
engineered construct with the same Pro125 fi Leu muta-
tion was inactive in a rescue model ofthe oep phenotype
in zebrafish [19]. These findings highlight the biological
importance ofCriptoand underline thefunctional signi-
ficance ofthe CFC domain. In the present work, we have
solved thedisulfidestructureofthe CFC domainand have
conducted biochemical studies that detail the interactions
of ALK4 with the CFC domain. From molecular modeling
studies, we have shown that the CFC domainofCripto is
structurally homologous to the von Willebrand Factor
type C-like domainand that Cripto protein is structurally
similar to the C-terminal extracellular portions of Jagged 1
and Jagged 2.
Correspondence to Dingyi Wen, Biogen, Inc., 14 Cambridge Center,
Cambridge, MA 02142, USA.
Fax: + 1 617 679 2616, Tel.: +1 617 679 2362.
E-mail: Dingyi_Wen@biogen.com
Abbreviations: CFC, CRIPTO/FRL-1/Cryptic; PMP-C, PARS inter-
cerebralis major Peptide C; oep, zebrafish one-eyed pinhead protein;
EGF, epidermal growth factor; LTbR, Lymphotoxinb Receptor;
FACS, fluorescence-activated cell sorter; PTH, phenylthiohydantoin;
Cripto delC-Fc, Cripto (amino acids 1–169) fused to the hinge and
Fc region ofhuman IgG1; NEM, N-ethylmaleimide; IAM, iodo-
acetamide; NES, 2-(N-ethylsuccinimidyl); ESI, electrospray
ionization; VWFC, von Willebrand Factor C domain;
GPI, glycosylphosphatidylinositol.
(Received 7 May 2003, revised 3 July 2003,
accepted 11 July 2003)
Eur. J. Biochem. 270, 3610–3618 (2003) Ó FEBS 2003 doi:10.1046/j.1432-1033.2003.03749.x
Experimental procedures
Protein expression and purification
Recombinant human Cripto-1 was expressed in Chinese
hamster ovary cells as a C-terminally truncated form,
comprising amino acid residues 1–169, and was purified by
immunoaffinity chromatography on an anti-Cripto mAb
column [12].
Fluorescence-activated cell sorter (FACS) analysis
Analysis of Cripto–ALK4 interactions by flow cytometry
was performed essentially as described earlier [19]. Briefly,
human 293 cells were transfected with plasmids expressing
human ALK4 (provided by M. Whitman, Harvard
Medical School), full length wild-type Cripto, Cripto
(N85G/T88A) or Cripto (H120G/W123G) using Fugene
(Roche) according to the manufacturer’s instructions.
After 48 h, cells were processed for flow cytometry.
Approximately 5 · 10
5
cells were incubated with
10 lgÆmL
)1
of either humanCripto delC-Fc, CFC-Fc,
EGF-Fc, LTbR (Lymphotoxinb Receptor)-Fc or human
ALK4-Fc (R&D Systems) followed by a PE (phycoery-
thrin)-conjugated anti-human Fc secondary Ig (Jackson
Immunoresearch).
Estimation of free thiol groups
Approximately 50 lg ofhumanCripto (residues 74–169,
which covers the combined EGF-CFC domains) was
incubated in 100 m
M
N-ethylmeleimide (NEM), 6
M
guanidine HCl, 72 m
M
Mes, pH 6.0, at 37 °Cfor1h.
The control sample was incubated in the same way, but
without NEM. Samples were desalted using ethanol
precipitation [20]. This treatment was followed by
complete reduction of cystines with 30 m
M
dithiothreitol
in 8
M
guanidine HCl, 150 m
M
Tris HCl, pH 8.5, at
45 °C for 45 min, followed by alkylation with 75 m
M
iodoacetamide (IAM) at room temperature for 1 h.
After desalting using ethanol precipitation, the sample
was deglycosylated with PNGase F (Glyko) in 2
M
urea,
50 m
M
sodium phosphate, pH 7.6, 20 m
M
methylamine
HCl, 5 m
M
EDTA, at 37 °C overnight. Intact mass was
measured on-line using a ZMD (electrospray ionization)
mass spectrometer (Waters). The molecular masses
were generated by deconvolution with the
MAXENT
1
program.
Generation of disulfide-linked CFC fragments of Cripto
using CNBR and endoproteinase lys-c
Nonreduced Cripto was treated with 1
M
CNBr in 70%
formic acid at room temperature for 36 h. To remove
residual CNBr, the treated sample was dried under vacuum
in a SpeedVacÒ concentrator, suspended in HPLC-grade
water and dried again. The water wash and drying steps
were repeated, after which the final pellet was dissolved in
200 m
M
Mes, pH 6.0. Approximately 60 lg of protein in
160 m
M
Mes, pH 6.0, 20% 2-propanol, was digested with
6 lg of endoproteinase Lys-C (WAKO) at room tempera-
ture for 24 h. An additional 6 lg ofthe enzyme was added
and the sample again incubated for 24 h. Solid guanidine
HCl was added to a final concentration of 6
M
to quench the
reaction.
Separation of disulfide-linked CFC fragments of Cripto
Fragments ofCripto were separated by reverse-phase
HPLC (rp-HPLC) on a 1-mm · 250-mm Vydac C
4
column
using a Waters Separation Module, Model 2690. Solvent A
was 0.1% (v/v) trifluoroacetic acid in water and solvent B
was 0.08% (v/v) trifluoroacetic acid in 75% acetonitrile.
A linear gradient from 0 to 60% acetonitrile over 160 min
was applied at flow rate of 0.07 mLÆmin
)1
.Thecolumn
temperature was 30 °C. Fractions were collected at 1-min
intervals. Fractions that were determined by mass analysis
to be enriched in the CFC-containing fragment were pooled
and concentrated to dryness under vacuum. The pellet was
dissolved in 160 m
M
Mes, pH 6.0, 20% (v/v) 2-propanol,
1m
M
CaCl
2
. Digestion was carried out at room tempera-
ture using the following regimen: 1.8 lg of thermolysin was
added at time 0 and after 24 h, and then 1.8 lgof
endoproteinase Lys-C was added after 48 h. The progress
of the digestion reaction was monitored by MALDI-
TOF MS. For this purpose, aliquots were removed after 24,
48 and 72 h and desalted using Millipore C
18
ZIP TIPs
TM
with or without reduction with 25 m
M
dithiothreitol in 4
M
guanidine HCl, 80 m
M
Tris HCl, pH 8.5, at room tem-
perature for 1–2 h prior to mass analysis. The enzymatic
digest was stopped by the addition of solid urea to 6
M
.The
thermolysin and endoproteinase Lys-C digests were separ-
ated on an rp-HPLC Vydac C
18
column using a 100-min
linear gradient of 0–75% acetonitrile. Solvent A was 0.03%
(v/v) trifluoroacetic acid in water, and solvent B was 0.024%
(v/v) trifluoroacetic acid in 75% (v/v) acetonitrile. One-
minute fractions were collected and concentrated to dryness
under vacuum, andthe residue was resuspended in 5 lLof
0.1% (v/v) trifluoroacetic acid, 30% (v/v) acetonitrile.
Peptide analysis using MALDI-TOF MS
MALDI-TOF MS was carried out on a Voyager-DE
TM
STR mass spectrometer (Applied Biosystems) in either
linear or reflector mode using a-cyano-4-hydroxy cinnamic
acid as matrix. Results generated using the linear mode are
expressed as average, protonated masses, those collected in
the reflector mode, as protonated, monoisotopic masses. An
aliquot of 0.5 lLor1lL of each test sample was applied to
the target plate. After partial evaporation ofthe sample
droplet at room temperature, 0.5 lL of matrix (10 mgÆmL
)1
Fig. 1. The predicted sequence ofhuman Cripto-1 encoded from DNA is
shown: the signal peptide is italicized and corresponds to residues 1–30.
Cripto del-C covers residues 31–169, with the CFC domain in bold.
The arrow indicates the predicted x-site for propeptide cleavage and
GPI-attachment; the signal region for GPI anchorage is underlined.
Ó FEBS 2003 StructureofthehumanCripto CFC domain (Eur. J. Biochem. 270) 3611
in 50% acetonitrile/0.1% trifluoroacetic acid, v/v) was
applied. Data acquisition andanalysis were controlled by
GRAMS
/32 software (version 4.11, Level 2).
N-terminal sequencing
Sequencing was carried out on an Applied Biosystems
Procise 494 cLC sequencer that was run in the pulsed liquid
mode. The resulting PTH (phenylthiohydantoin) amino
acids were separated using an ABI 140D Solvent Delivery
System with a 0.8-mm · 250-mm, C
18
PTH column and
were monitored on-line using an ABI 785A programmable
absorbance detector. Data were analyzed using the
ABI
610
A
data analysis software.
Homology search
A
BLAST
search [21] ofthe SWISSPROT database was
carried out using the primary sequence ofthe CFC region
(residues 113–154) ofhumanCripto as the query motif.
Disulfide pattern search
The experimentally defined disulfide bond pattern of the
CFC region ofCRIPTO was used to query an in-house
disulfide database built from annotations in SWISSPROT.
The search method reports all proteins with the same
disulfide topology, e.g. C1-C4/C2-C6/C3-C5 (C1 is the first
cysteine in the domain, C2 is the second, C3 is the third,
etc.), and ranks them according to sequence spacing
between the cysteines.
Comparative modeling
The 3-D structures ofthe EGF-like domainandthe CFC
domain were modeled separately using the MODELER
module [22] of the
INSIGHT II
software package (Accelrys,
Inc., San Diego, CA, USA). The NMR structureof mouse
EGF [Protein Data Bank code (pdb): 1epi] was used for
modeling ofthe EGF-like domain. The CFC domain was
built using the NMR structureofthe proteinase inhibitor,
PMP-C (pdb: 1pmc).
Motif search
Motif elements, identified independently by homology or
disulfide pattern, were used to query the nonredundant
database TREMBL using the
DART
program (a domain
motif search algorithm).
Results
The Cripto CFC domain is a functional unit
The predicted amino acid sequence (Fig. 1) of mature
human Cripto contains 12 cysteine residues, six in the EGF
domain and six in the CFC domain. To test whether the
CFC domain could retain its function independent of the
EGF domain, we generated a soluble form ofthe CFC
domain, comprised ofthe signal peptide and amino acids
112–169, fused to the hinge and Fc region ofhuman IgG1
(CFC-Fc), and tested its ability to bind to ALK4.
Previously, we showed in a FACS assay that full length
Cripto (amino acids 1–169) fused to human Fc (Cripto
delC-Fc) bound to human 293 cells expressing ALK4, but
not to control 293 cells lacking ALK4 [19]. We have now
evaluated the binding of soluble CFC-Fc to ALK4-293
expressing cells by FACS assay, using Cripto delC-Fc as
the positive control and LTbR-Fc as a negative control.
Figure 2A shows the results of this comparison. A signifi-
cant shift in mean fluorescence for ALK4-293 cells was seen
in the presence of either Cripto delC or CFC-Fc, but not
with LTbR-Fc (Fig. 2A2). A small shift was also seen with
EGF-Fc, but this shift was not dependent on ALK4
expression. These experiments also show that the shift in
mean fluorescence for CFC-Fc (Figs 2 and 3) binding to
ALK4-293 cells is of similar magnitude as the shift of the
positive control, Cripto delC-Fc (Fig. 2A1), and therefore
that the CFC domain is sufficient for the interaction of
Cripto with ALK4.
To verify the role ofthe CFC domain in ALK4 binding,
we analyzed the effects of point mutations in the EGF and
CFC domains by FACS analysis, using mutations in the
EGF and CFC domains known to disrupt function,
specifically either downstream signaling or ALK4 binding
[12,16]. We have also compared the ability of both types of
mutants, i.e. the CFC domain mutant, H120G/W123G, and
the EGF domain mutant, N85G/T88A, to bind to ALK4
Fig. 2. FACS analysisofthe interactions between Criptoand ALK4.
(A) Incubation of soluble Cripto delC-Fc (A1), EGF-Fc (A2), and
CFC-Fc (A3) with 293 cells expressing ALK4. The cells expressing
ALK4 (bold, solid curve) are compared to the control cells that do not
express ALK4 (solid curve). Incubation of LTbR-Fc with 293 cells
expressing ALK4 was used as a control for the Fc portion of the
proteins (dashed curve). (B) Incubation of ALK4-Fc with 293 cells
expressing full length wild-type Cripto (B1), Cripto N85G/T88A (B2),
or Cripto H120G/W123G (B3). Cells expressing Cripto or mutants
(bold, solid curve) are compared to the control cells that do not express
any Cripto proteins (solid curve).
3612 S. F. Foley et al. (Eur. J. Biochem. 270) Ó FEBS 2003
by FACS (Fig. 2B). The results showed that ALK4-Fc
binds well to cells expressing either wild type Cripto
(Fig. 2B1) or the EGF domain mutant, N85G/T88A
(Fig. 2B2), but does not bind to cells expressing the CFC
mutant, H120G/W123G (Fig. 2B3). This andthe previous
experiments demonstrate that the CFC domain is involved
in ALK4 binding.
Determination ofdisulfide linkages in the CFC domain
Determination of whether there are free thiol groups in the
protein was done by alkylation ofthe protein with NEM
under nonreducing conditions followed by alkylation with
IAM under reducing conditions. Alkylation of a cysteine
with NEM will add 125.1 Da to the mass ofthe protein or
peptide, whereas alkylation with IAM will add a mass of
56.9 Da. The results from ESI mass spectrometric analysis
showed a range of molecular masses corresponding to
residues 74–169 completely alkylated with IAM, with
heterogeneity in glycosylation. Masses corresponding to
protein containing 2-(N-ethylsuccinimidyl)-cysteine (NES-
Cys) residues were not detected. Therefore, we conclude that
all ofthe cysteine residues in the protein are disulfide-linked.
To study thedisulfide structures ofthe CFC domain,
a double cleavage strategy was developed using CNBr
treatment followed by endoproteinase Lys-C cleavage. This
strategy took advantage of a Lys residue (Lys112) between
the EGF-like domainandthe CFC domainand a Met
residue (Met154) between the last Cys in the CFC domain
and the O-linked glycosylation site at residue Ser161. The
dual digest was then separated by rpHPLC andthe fractions
containing the CFC domain were identified by MALDI-
TOF MS and were pooled for further analysis (see below).
In the CFC region, there are three Lys residues that might
be cleaved by endoproteinase Lys-C and two Trp residues
that could be oxidized during CNBr treatment [23].
Additional cleavage can take place on the C-terminal side
of oxidized Trp [24]. The observed protonated mass (MH
+
)
of the major component in the pooled fractions containing
the CFC domain was 4702.4 Da (Fig. 3), which is consis-
tent with fragments having either two oxidized Trp residues
and one cleavage at a Lys residue or one oxidized Trp and
cleavages at two ofthe Lys residues. In-source fragment
ions, MH
+
¼ 1599.5 Da and MH
+
¼ 3105.9 Da (Fig. 3)
indicated that the 4702-Da component was derived mainly
from the peptides 113–126 (calculated m/z ¼ 1599.8) and
127–154 (calculated m/z ¼ 3106.8), linked by a disulfide
bond. A minor component generated by an additional
cleavage after oxidized Trp123 (calculated MH
+
¼
4365.12 Da, based on disulfide linked peptides 113–123
and 127–154) was also identified (Fig. 3). The pooled
fractions were analyzed by MALDI-TOF MS after reduc-
tion also. The results support the identification of the
CFC peptides predicted from in-source fragmentation.
N-terminal sequencing results also supported this inter-
pretation (data not shown).
The CFC domain-containing fractions were further
digested with thermolysin, followed by endoproteinase
Lys-C. Twenty percent propanol was added to the digest
to promote preferential cleavages by thermolysin at the
N-terminus of leucine, isoleucine, and phenylalanine [25].
The extent of proteolytic cleavage between cysteine residues
was monitored by MALDI-TOF MS after reduction (data
not shown). Figure 4 shows the mass spectrum of the
nonreduced digest after all enzyme treatment. For the sake
of simplicity, we use C1 for the first cysteine residue in the
CFC domain, C2 for the second, C3 for the third, etc. We
will use this nomenclature in the following discussion.
Interpretation ofthe data for the nonreduced sample is
supported by identification ofthe peptides necessary to
form the predicted disulfide bonds. For example, mass
signal detected at m/z ¼ 2243.1 was interpreted as a
disulfide-linked component composed of peptides 113–126
[C1 (Cys115)] and 133–137 [C4 (Cys133)] (Fig. 4). Corres-
ponding peptide 113–126 (m/z
cal
¼ 1598.7) and peptide
133–137 (m/z
cal
¼ 646.2) were detected both under reducing
Fig. 4. MALDI-TOF mass spectrum ofthe nonreduced CFC domain
after all enzymatic treatments. The spectrum was derived in the
reflector mode and all masses correspond to protonated monoisotopic
mass. Enzyme fragment peaks are identified with asterisks and
in-source fragments are underlined.
Fig. 3. MALDI-TOF mass spectrum ofthe CFC domain-containing
fractions under nonreducing conditions. Peptide a, ENCGSVPHD
TW
OX
LPK; peptide b, ENCGSVPHDTW
OX
and peptide c, KCSLC
KCW
OX
HGQLRCFPQAFPQAFLPGCDGLVM. The spectrum was
obtained in the linear mode and all masses correspond to protonated
average masses. Oxidized Trp residues are represented as W
OX
and the
Met residue converted to homoserine lactone is in italics. Masses
corresponding to intact CFC domain were not present. In-source
fragments are indicated with asterisks.
Ó FEBS 2003 StructureofthehumanCripto CFC domain (Eur. J. Biochem. 270) 3613
conditions and as in-source fragment ions derived from the
disulfide-linked peptide (Fig. 4). The mass spectrometric
data also clearly demonstrate that C3 (Cys131) forms a
disulfide bond with C5 (Cys140) as evidenced by disulfide-
linked peptides at masses 957.5, 1066.6, 1123.7, and 1194.7.
In addition, certain in-source fragments expected from these
disulfide-linked peptides are present, i.e. at 763.4 and
834.5 Da (Fig. 4). As C1 is disulfide-bonded to C4 and
C3 is disulfide-bonded to C5, it can be deduced that C2
must be linked to C6, although the corresponding mass was
not detected, presumably due to ion suppression. To
confirm this deduction, the thermolysin digest was separated
by rpHPLC andthe peaks were analyzed by MALDI-
TOF MS. In one ofthe major peaks, masses of 1042.5 and
914.5 corresponding to the disulfide-linked peptides
FLPGC(6)DG with KC(2)S and FLPGC(6)DG with
C(2)S, respectively, were detected. Other disulfide-linked
peptides, such as C1-C4 and C3-C5, were also identified by
MALDI-TOF MS in different fractions. The fractions
containing disulfide-linked peptides were evaluated by
N-terminal sequencing. The mass spectrometric and
N-terminal sequencing results confirmed that C1 is linked
toC4,C2toC6,andC3toC5.
Primary structure search
The amino acid sequence information for the CFC region
of humanCripto was used to carry out a
BLAST
search of
the combined SWISSPROT/TREMBL database. An
initial search showed matches to the VWFC (von
Willebrand Factor C)-like domain in humanand chicken
a-1 collagen, mouse andhuman NELL 2, and chicken
NEL, with low homology (e-value > 0.1), in addition to
other Criptoand Cripto-like proteins. The VWFC
domain is defined by a pattern of 10 cysteine residues
of undetermined connectivity, but, the similarity of the
Cripto CFC domain to the above-listed proteins is
confined to the portion ofthe VWF-C motif containing
the first six cysteine residues.
Motif search
Subsequent searches ofthe protein database with
DART
,
using combined EGF-like/VWFC sequences as queries,
provided additional matches, some of which are listed in
Table 1. Many ofthe identified proteins contain both EGF-
like and VWFC domains, but, only in human Jagged 1 and
Jagged 2, Drosophila Serrate, and NELL 1 were both
domains adjacent to and in the same order as the putative
EGF-like and CFC regions ofhuman Cripto. Furthermore,
only in Jagged 2 are the regions adjacent to the membrane
interface (transmembrane) region. To examine the strength
of this relationship, we aligned the amino acid sequences of
human Jagged 2 andCripto (Fig. 5). The conservation of
residues such as cysteine, proline, glycine, and tryptophan,
which are important for the protein folding, is highlighted in
the alignment [26].
Disulfide pattern search and comparative modeling
The disulfide pattern that was determined experimentally
for humanCripto (i.e. C1-C4, C2-C6, C3-C5) was used to
query a disulfide database compiled from SWISSPROT
annotations (van Vlijmen, H. W. T., Gupta, A. & Singh, J.,
Biogen Corp., unpublished observations). This is an ortho-
gonal method for exploring relationships, and revealed two
small, structurally related serine protease inhibitors, PMP-
D2 and PMP-C [27], that were not uncovered using
BLAST
on the SWISSPROT/TREMBL database. Based on the
NMR structureof PMP-C (Protein Data Bank code, 1pmc)
and the sequence alignment shown in Fig. 6, a 3-D model
was built for the CFC domainofCripto (Fig. 7). In the
Table 1. Summary of some ofthe proteins identified as containing VWFC-like domains. Definition of motifs as VWFC-like are based on SWISS-
PROT annotations and NCBI DART predictions.
Protein name No. of EGF-like domains No. of VWFC-like domains No. of Cys in VWFC-like domains
Cripto (human or mouse) 1 1 6
VWF (human and porcine) 0 3 10
NEL (chicken) 6 5 10 in domains 1–4, 8 in domain 5
NELL1 (rat) 6 5 10 in domains 1–4, 8 in domain 5
NELL2 (human and rat) 6 5 10 in domains 1–4, 8 in domain 5
Protein kinase C (BP) 6 5 10 in domains 1–4, 8 in domain 5
Jagged 1 and Jagged 2 (human) 15 1 10
Serrate and Drosophila 14 1 10
Fig. 5. Alignment ofthe sequences ofhumanCriptoandhuman Jagged 2. Conserved residues are framed with solid lines and homologous residues
are framed with dashed lines. The sequence identity over the alignment length is 26%.
3614 S. F. Foley et al. (Eur. J. Biochem. 270) Ó FEBS 2003
computed model oftheCripto CFC domain, one side of the
molecule has a high concentration of hydrophobic residues,
including Trp134, Leu138, Phe141, Pro142, Phe145, and
Leu146. These hydrophobic residues are on the side of the
protein opposite to residues His120 and Trp123 that have
been implicated in binding ofCripto to ALK4. The
hydrophobic residues may play a role in the folding of
Cripto by interacting with the EGF-like domain, or they
may constitute the interaction site with other signaling
components. A 3-D model ofthe EGF-like domain of
Cripto was also built, based on the NMR structure of
murine EGF (Protein Data Bank code, 1epi), by aligning
the cysteine residues as described previously [28]. Two
theoretical models ofthe full-length Cripto protein were
constructed by connecting the EGF and CFC modules. In
the first model (Fig. 7A) the domains are arranged in an
extended conformation, analogous to the conformation
found for the solution structureof a covalently linked pair
of EGF domains from human fibrillin-1 [29]. The second
model has a more globular structure in which the EGF and
Fig. 7. Model for EGF-CFC domains of Cripto. Hypothetical structures for EGF-CFC domains are shown in an extended conformation (A) or
closed conformation (B). In the CFC domain, disulfide bonds Cys115-Cys133, Cys128-Cys149, and Cys131-Cys140 are indicated by DS1, DS2 and
DS3, respectively. Residues H120 and W123 have been implicated in Alk4 binding and are shown in purple. Residues N79 and T88 (shown in red)
are modified through N-linked glycosylation and O-linked fucosylation, respectively. Residues N79, N85, R104, and E107 (the latter three shown in
blue) have been shown to be important in Nodal induction of Smad2 phosphorylation [16]. Nter designates the location ofthe expressed amino-
terminus; Cter designates the location ofthe expressed carboxyl-terminus.
Fig. 6. Alignment ofthe sequences ofthe CFC domainofhumanCriptoand PMP-C. Conserved residues are framed with solid lines and homologous
residues are framed with dashed lines.
Ó FEBS 2003 StructureofthehumanCripto CFC domain (Eur. J. Biochem. 270) 3615
CFC modules have a large number of noncovalent contacts
(Fig. 7B), analogous to the crystal packing ofthe EGF-like
domains from human factor IX [30].
We have also modeled the EGF-like (15th EGF domain)
and adjacent VWF-C domains ofhuman Jagged 2, using
the same approach described for theCripto EGF-like and
CFC domains and found that there are no structural
incompatibilities.
Discussion
We have used chemical and enzymatic fragmentation, mass
spectrometry, and N-terminal sequence analysis to charac-
terize thedisulfide linkages ofthe cysteine residues in the
CFC region ofhuman Cripto. From these studies, we show
that the six cysteines are linked in three disulfide bonds,
C1-C4, C3-C5, C2-C6. We performed these experiments on
a truncated, recombinant version ofhuman Cripto, con-
taining residues 31–169 of wild type human Cripto. We
consider the results a valid representation ofthe wild type
structure because a seventh Cys residue, Cys181, is located
in the predicted GPI Ôsignal sequenceÕ that would normally
be cleaved off during processing ofthe wild type protein
[31,32]. Furthermore, it has been demonstrated that this
soluble, C-terminally truncated recombinant human Cripto
protein is biologically active [2]. Using both the primary
sequence ofthe CFC region andthe experimentally defined
disulfide pattern to query protein sequence and disulfide
databases, we obtained matches to a group of proteins
containing a VWFC-like motif. The VWFC-like motif is
believed to play an important role in the formation of
certain protein complexes, examples including thrombo-
spondin 1 (TSP1), which binds to CD36 on endothelial cells
[29], and procollagen IIA and chordin, which bind to bone
morphogenic protein [33]. The binding properties of these
proteins have led to the hypothesis that proteins containing
VWFC-like domains (Cys-rich) act as ÔTGFbeta sinksÕ in
modulating development [33]. Although most ofthe docu-
mented VWFC-like motifs contain 10 cysteine residues,
there are several instances where such regions have fewer
than 10, e.g. the C-terminal VWFC domainof NEL
(chicken), NELL 1 (rat) and NELL 2 (human and rat), and
the last VWFC region of murine tectorin – all of which
contain only eight cysteine residues. In all ofthe examples of
proteins containing shortened VWFC-like domains, the
motif is abbreviated by loss ofthe C-terminal region,
covering residues Cys9 and Cys10. These observations
suggest that the CFC region in Cripto can be considered as a
truncated form ofthe VWFC-like domain. Assuming that
the CFC region ofCripto is VWFC-like, we infer that the
EGF-CFC family of proteins is a variation of an already
described theme for which there are many examples in
modular proteins. Among them are several that have a
juxtaposition ofthe EGF and CFC domains seen in Cripto,
forexample,NELL1,NELL2,JAGGED1and
JAGGED 2, in which at least one ofthe EGF-like domains
is N-terminal to a VWFC-like domain (Table 1). For
JAGGED 1 and JAGGED 2, the similarity extends to the
position ofthe membrane attachment sequence, specifically,
a trans-membrane domain that is C-terminal to the
VWFC-like domain, and we found that there was a striking
degree ofstructural similarity between Cripto EGF-CFC
and human JAGGED 2 (Fig. 5). As with Cripto, human
JAGGED 2 is involved in signal transduction as a ligand
for the NOTCH receptor, another EGF homolog [34].
Moreover, similar to Cripto, a major function of
JAGGED 2 is in patterning and morphogenesis in early
embryonic development [35,36]. Although JAGGED 2 is
not fucosylated as Cripto is [2,12], the function of NOTCH
ligand is reportedly regulated by fucosylation ofthe Notch
receptor [35]. The specific role ofthe individual domains of
human JAGGED has not been delineated, but Serrate, the
Drosophila version of JAGGED, has been investigated.
Hukriede et al. [37] have shown that a truncated form of
Serrate, lacking the VWFC region [38], binds to NOTCH
but does not activate NOTCH signaling. The functions of
the domains in Cripto are still being investigated, but initial
information published previously by Yeo et al. [16] and
described here indicate that the EGF and CFC domains
have different functions. Yeo et al. showed that ALK4 was
coimmunoprecipitated with the CFC domainof murine
Cripto, but not with the CFC mutant (H120G/W123G)
[16]. Here we have confirmed and expanded upon these
findings using ALK4 andhuman Cripto, and have demon-
strated that the CFC domain alone is sufficient for ALK4
binding. These experiments highlight the important role of
the CFC domain, like other VWFC-domains [29,33], in
complex formation.
Recently, Minchiotti et al. [7] postulated a structural
model ofhumanCripto based on the beta-trefoil fold of
basic FGF. In this model, the EGF-like and CFC domains
form the second and third lobes ofthe trefoil structure,
respectively. We now believe this model to be incorrect
because it cannot accommodate the actual disulfide
connectivities in the CFC domainofCripto described
here. Using our experimentally determined disulfide pat-
tern in the CFC domain to search a disulfide database
compiled from SWISSPROT, we identified a structurally
known homologue, chymotrypsin inhibitor PMP-C.
Because of amino acid sequence similarities and disulfide
linkage identity between theCripto CFC domain and
PMP-C, we built a model oftheCripto CFC domain using
the NMR solution structureof PMP-C as a template
(Fig. 7). Our model is consistent with data from previous
functional studies [7,16,19], as well as from the current
study, in particular, the observation that mutations in the
CFC domain at His120 and Trp123 abolish ALK4
interactions (Fig. 2B). In our model (Fig. 7), the side-
chains of His120 and Trp123 are solvent-exposed, allowing
for possible protein–protein interaction. Interestingly, in
our CFC model, we have identified a hydrophobic patch
consisting of Trp134, Leu138, Phe141 and Pro142. Leu138
and homologues of Trp134 and Phe141 are conserved
throughout theCripto family [1] and are clustered on the
side ofthe CFC domain opposite the presumed ALK4
binding site (which includes His120 and Trp123). This
hydrophobic patch may be important for protein–protein
interactions.
Two possible structural models for full-length Cripto
protein – a linear (open) configuration (Fig. 7A) and a
closed configuration (Fig. 7B) – have been constructed by
connecting an EGF-like module [28] and our CFC module
(Fig. 7). However, at this point we do not have enough data
to favor one model over the other. Both models fulfill the
3616 S. F. Foley et al. (Eur. J. Biochem. 270) Ó FEBS 2003
predictions for thestructureofthe EGF-like domain,
namely, solvent exposure ofthe fucosylation site at Thr88
and the N-linked glycosylation site at Asn79, and both
allow for potential protein–protein interactions via the
above-described hydrophobic patch. Structure determin-
ation ofhumanCripto by NMR is in progress to address
these questions.
In summary, thedisulfide bond pattern for the six
cysteine residues in the CFC domainofhumanCripto has
been experimentally defined as C1-C4, C2-C6, C3-C5, and
biochemical studies have shown that the CFC domain binds
to ALK4 independent ofthe EGF domain. Database
searches based on the primary sequence have uncovered
similarities between Cripto EGF-CFC domains and the
EGF-VWFC domains ofthe C-terminal extracellular
portions of Jagged 1 and Jagged 2. A 3-D structural model
of the CFC domain was constructed based on the NMR
structure of PMP-C, a serine protease inhibitor having the
same disulfide connectivity. This model revealed a hydro-
phobic patch that is probably important for protein binding.
Two possible models for intact Cripto have also been
proposed. By exploring thestructural features of Cripto, as
defined by our models, we hope to increase the understand-
ing ofthe role ofCripto in the Nodal signal transduction
pathway.
Acknowledgements
We would like to thank Dr R. Blake Pepinsky for his review and editing
of this manuscript. We would also like to thank Dr Joseph Rosa, Drs
Kevin Williams, Alphonse Galdes, and Alex Buko for their valuable
insights.
References
1. Colas, J.F. & Schoenwolf, G.C. (2000) Subtractive hybridization
identifies chick-cripto, a novel EGF-CFC ortholog expressed
during gastrulation, neurulation and early cardiogenesis. Gene
255, 205–217.
2. Saloman, D.S., Bianco, C., Ebert, A.D., Khan, N.I., De Santis,
M., Normanno, N., Wechselberger, C., Seno, M., Williams, K.,
Sanicola, M., Foley, S., Gullick, W.J. & Persico, G. (2000) The
EGF-CFC family: novel epidermal growth factor-related proteins
in development and cancer. Endocr. Relat. Cancer 7, 199–226.
3. Bamford,R.N.,Roessler,E.,Burdine,R.D.,Saplakoglu,U.,dela
Cruz, J., Splitt, M., Towbin, J., Bowers, P., Ferrero, G.B., Marino,
B.,Schier,A.F.,Shen,M.M.,Muenke,M.&Casey,B.(2000)
Loss-of-function mutations in the EGF-CFC gene CFC1 are
associated with human left-right laterality defects. Nat. Genet. 26,
365–369.
4. Dono, R., Scalera, L., Pacifico, F., Acampora, D., Persico, M.G.
& Simeone, A. (1993) The murine cripto gene: expression during
mesoderm induction and early heart morphogenesis. Development
118, 1157–1168.
5. Ding, J., Yang, L., Yan, Y.T., Chen, A., Desai, N., Wynshaw-
Boris, A. & Shen, N.N. (1998) Cripto is required for correct
orientation ofthe anterior-posterior axis in the mouse embryo.
Nature 395, 702–707.
6. Gritsman, K., Zhang, J., Cheng, S., Heckscher, E., Talbot, W.S. &
Schier, A.F. (1999) The EGF-CFC protein one-eyed pinhead is
essential for nodal signaling. Cell 97, 121–132.
7. Minchiotti, G., Manco, G., Parisi, S., Lago, C.T., Rosa, F. &
Persico, M.G. (2001) Structure-function analysisofthe EGF-CFC
family member Cripto identifies residues essential for nodal sig-
nalling. Development 128, 4501–4510.
8. Saloman, D.S., Bianco, C. & De Santis, M. (1999) Cripto: a novel
epidermal growth factor (EGF) -related peptide in mammary
gland development and neoplasia. Bioessays 21, 61–70.
9. Schier, A.F., Neuhauss, S.C., Helde, K.A., Talbot, W.S. &
Driever, W. (1997) The one-eyed pinhead gene functions in
mesoderm and endoderm formation in zebrafish and interacts
with no tail. Development 124, 327–342.
10. Xu, C., Liguori, G., Persico, M.G. & Adamson, E.D. (1999)
Abrogation oftheCripto gene in mouse leads to failure of post-
gastrulation morphogenesis and lack of differentiation of cardio-
myocytes. Development 126, 483–494.
11. Zhang, J., Talbot, W.S. & Schier, F. (1998) Positional cloning
identifies zebrafish one-eyed pinhead as a permissive EGF-related
ligand required during gastrulation. Cell 92, 241–251.
12. Schiffer, S.G., Foley, S.F., Kaffashan, A., Hronowski, X.,
Zichittella, A.E., Yeo, C.Y., Miatkowski, K., Adkins, H.B.,
Domon. B., Whitman, M., Salomon, D., Sanicola, M. &
Williams, K.P. (2001) Fucosylation ofCripto is required for its
ability to facilitate nodal signaling. J. Biol. Chem. 276,
37767–37777.
13. Ciccodicola, A., Dono, R., Obici, S., Simeone, A., Zollo, M. &
Persico, M.G. (1989) Molecular characterization of a gene of
the ÔEGF familyÕ expressed in undifferentiated human NTERA2
teratocarcinoma cells. EMBO J. 8, 1987–1991.
14. Shen, M.M., Wang, H. & Leder, P. (1997) A differential display
strategy identifies Cryptic, a novel EGF-related gene expressed in
the axial and lateral mesoderm during mouse gastrulation.
Development 124, 429–442.
15. Reissmann, E., Jornvall, H., Blokzijl, A., Andersson, O., Chang,
C., Minchiotti, G., Persico, M.G., Ibanez, C.F. & Brivanlou, A.H.
(2001) The orphan receptor ALK7 andthe Activin receptor ALK4
mediate signaling by Nodal proteins during vertebrate develop-
ment. Genes Dev. 15, 2010–2022.
16. Yeo, C.Y. & Whitman, M. (2001) Nodal signals to Smads through
Cripto-dependent and Cripto-independent mechanisms. Mol. Cell
7, 949–957.
17. Bianco, C., Adkins, H.B., Wechselberger, C., Seno, M.,
Normanno,N.,DeLuca,A.,Sun,Y.,Khan,N.,Kenny,N.,
Ebert, A., Williams, K.P., Sanicola, M. & Salomon, D. (2002)
Cripto-1 activates nodal- and ALK4-dependent and – independ-
ent signaling pathways in mammary epithelial Cells. Mol. Cell
Biol. 22, 2586–2597.
18. Yan, Y., Liu, J., Luo, Y.E.C., Haltiwanger, R.S., Abate-Shen, C.
& Shen, M.M. (2002) Dual roles ofCripto as a ligand and
coreceptor in the nodal signaling pathway. Mol. Cell. Biol. 22,
4439–4449.
19. De la Cruz, J.M., Bamford, R.N., Burdine, R.D., Roessler, E.,
Barkovich, A.J., Donnai, D., Schier, A.F. & Muenke, M. (2002) A
loss-of-function mutation in the CFC domainof TDGF1 is
associated with human forebrain defects. Hum. Genet. 110,
422–428.
20. Pepinsky, R.B. (1991) Selective precipitation of proteins from
guanidine hydrochloride-containing solutions with ethanol. Anal.
Biochem. 195, 177–181.
21. Altschul, S.F., Gish, W., Miller, W., Myers, E.W. & Lipman, D.J.
(1990) Basic local alignment search tool. J. Mol. Biol. 215,
403–410.
22. Sali, A. & Blundell, T.L. (1993) Comparative protein modelling by
satisfaction of spatial restraints. J. Mol. Biol. 34, 779–815.
23. Morrison, J.R., Fidge, N.H. & Gergo, B. (1990) Studies on the
formation, separation, and characterization of cyanogen bromide
fragments ofhuman AI apolipoprotein. Anal. Biochem. 186,
145–152.
Ó FEBS 2003 StructureofthehumanCripto CFC domain (Eur. J. Biochem. 270) 3617
24. Boulware, D.W., Goldsworthy, P.D., Nardella, F.A. & Mannik,
M. (1985) Cyanogen bromide cleaves Fc fragments of pooled
human IgG at both methionine and tryptophan residues. Mol.
Immunol. 22, 1317–1322.
25. Welinder, K.G. (1988) Generation of peptides suitable for
sequence analysis by proteolytic cleavage in reversed-phase high-
performance liquid chromatography solvents. Anal. Biochem. 174,
54–64.
26. Naismith, J.H. & Sprang, S.R. (1998) Modularity in the TNF-
receptor family. Trends Biochem. Sci. 23, 74–79.
27. Mer, G., Hietter, H., Kellenberger, C., Renatus, M., Luu, B. &
Lefevre, J.F. (1996) Solution structureof PMP-C: a new fold in
the group of small serine proteinase inhibitors. J. Mol. Biol.
258, 158–171.
28. Lohmeyer, M., Harrison, P.M., Kannan, S., DeSantis, M.,
O’Reilly, N.J., Sternberg, M.J., Salomon, D.S. & Gullik, W.J.
(1997) Chemical synthesis, structural modeling, and biological
activity ofthe epidermal growth factor-like domainof human
cripto. Biochemistry 36, 3837–3845.
29. Dawson,D.W.,Pearce,S.F.,Zhong,R.,Silverstein,R.L.,Frazier,
W.A. & Bouck, N.P. (1997) CD36 mediates the In vitro inhibitory
effects of thrombospondin-1 on endothelial cells. J. Cell Biol. 138,
707–717.
30. Rao,Z.,Handford,P.,Mayhew,M.,Knott,V.,Brownlee,G.G.&
Stuart, D. (1995) Thestructureof a Ca (2+)-binding epidermal
growth factor-like domain: its role in protein–protein interactions.
Cell 82, 131–141.
31. Ferguson, M.A. & Williams, A.F. (1988) Cell-surface anchoring of
proteins via glycosyl-phosphatidylinositol structures. Annu. Rev.
Biochem. 57, 285–320.
32. Englund, P.T. (1993) Thestructureand biosynthesis of glycosyl
phosphatidylinositol protein anchors. Annu. Rev. Biochem. 62,
121–138.
33. Larrain, J., Bachiller, D., Lu, B., Agius, E., Piccolo, S. & De
Robertis, E.M. (2000) BMP-binding modules in chordin: a model
for signalling regulation in the extracellular space. Development
127, 821–830.
34. Muskavitch, M.A. (1994) Delta-notch signaling and Drosophila
cell fate choice. Dev. Biol. 166, 415–430.
35. Hicks, C., Johnston, S.H., diSibio, G., Collazo, A., Vogt, T.F. &
Weinmaster, G. (2000) Fringe differentially modulates Jagged1
and Delta1 signalling through Notch1 and Notch2. Nat. Cell Biol.
2, 515–520.
36. Lanford, P.J., Lan, Y., Jiang, R., Lindsell, C., Weinmaster, G.,
Gridley, T. & Kelley, M.W. (1999) Notch signalling pathway
mediates hair cell development in mammalian cochlea. Nat. Genet.
21, 289–292.
37. Hukriede, N.A. & Fleming, R.J. (1997) Beaded of Goldschmidt,
an antimorphic allele of Serrate, encodes a protein lacking trans-
membrane and intracellular domains. Genetics 145, 359–374.
38. Hukriede, N.A., Gu, Y. & Fleming, R.J. (1997) A dominant-
negative form of Serrate acts as a general antagonist of Notch
activation. Development 124, 3427–3437.
3618 S. F. Foley et al. (Eur. J. Biochem. 270) Ó FEBS 2003
. The CRIPTO/ FRL-1/CRYPTIC (CFC) domain of human Cripto
Functional and structural insights through disulfide structure analysis
Susan F biological
importance of Cripto and underline the functional signi-
ficance of the CFC domain. In the present work, we have
solved the disulfide structure of the CFC domain and