MINIREVIEW
Wheat germcell-freeplatformforeukaryotic protein
production
Dmitriy A. Vinarov, Carrie L. Loushin Newman and John L. Markley
Center forEukaryotic Structural Genomics, Biochemistry Department, University of Wisconsin-Madison, Madison, WI, USA
Introduction
One of the most important tasks in biotechnology
today is the development of improved systems and
strategies for synthesizing any desired protein or pro-
tein fragment in its folded, soluble form on a prepara-
tive scale. This task is fundamental to the success of
structural genomics projects, which promise to capital-
ize upon numerous advances in science and technol-
ogy to change the appreciation and understanding of
biological systems. Structural genomics implies a
move away from hypothesis-driven research to a sys-
tem of solving structures first and using these struc-
tures and other structures modeled from them as the
source of hypotheses for further research. The medical
incentives for understanding protein structure are
great. Many diseases are caused by defects in a single
protein that alter its folding, stability, or activity. The
structures of proteins involved in diseases will move
us a step closer to improving disease treatment, diag-
nosis, and prevention. Beyond their specific medical
applications, structural genomics projects are teaching
fundamental lessons about the structural basis of life
on this planet.
Keywords
cell-free extract; in vitro; isotopic labeling;
NMR screening; NMR structure
determination; protein production; protein
structure; transcription; translation; wheat
germ
Correspondence
J. L. Markley, Biochemistry Department,
University of Wisconsin-Madison, 433
Babcock Drive, Madison, WI 53706, USA
Fax: +1 608 262 3759
Tel: +1 608 263 9349
E-mail: markley@nmrfam.wisc.edu
Website: http://uwstructuralgenomics.org
(Received 2 May 2006, revised 13 July
2006, accepted 26 July 2006)
doi:10.1111/j.1742-4658.2006.05434.x
We describe a platform that utilizes wheatgermcell-free technology to pro-
duce protein samples for NMR structure determinations. In the first stage,
cloned DNA molecules coding for proteins of interest are transcribed and
translated on a small scale (25 lL) to determine levels of protein expression
and solubility. The amount of protein produced (typically 2–10 lg) is suffi-
cient to be visualized by polyacrylamide gel electrophoresis. The fraction of
soluble protein is estimated by comparing gel scans of total protein and
soluble protein. Targets that pass this first screen by exhibiting high protein
production and solubility move to the second stage. In the second stage,
the DNA is transcribed on a larger scale, and labeled proteins are pro-
duced by incorporation of [
15
N]-labeled amino acids in a 4 mL translation
reaction that typically produces 1–3 mg of protein. The [
15
N]-labeled pro-
teins are screened by
1
H-
15
N correlated NMR spectroscopy to determine
whether the protein is a good candidate for solution structure determin-
ation. Targets that pass this second screen are then translated in a medium
containing amino acids doubly labeled with
15
N and
13
C. We describe the
automation of these steps and their application to targets chosen from a
variety of eukaryotic genomes: Arabidopsis thaliana, human, mouse, rat,
and zebrafish. We present protein yields and costs and compare the wheat
germ cell-free approach with alternative methods. Finally, we discuss
remaining bottlenecks and approaches to their solution.
Abbreviations
CESG, Center forEukaryotic Structural Genomics; GST, glutathione S-transferase; HSQC, heteronuclear single-quantum correlation
spectroscopy; IMAC, immobilized metal affinity chromatography; PDB, Protein Data Bank; [U-
15
N]-, uniform labeling with nitrogen-15;
SAIL, stereo-array isotope labeled; Se-Met, selenomethionine.
4160 FEBS Journal 273 (2006) 4160–4169 ª 2006 The Authors Journal compilation ª 2006 FEBS
Protein production remains a bottleneck in proteo-
mics, for both structural and functional studies. Most
structural biology groups and structural genomics cen-
ters utilize cell-based, heterologous protein production
from Escherichia coli. However, this approach fails
with many individual proteins, particularly those from
eukaryotes. Failures result from no or low expression,
low solubility, or degradation. Expression levels can be
improved by producing the protein of interest as a
cleavable fusion with a highly expressing protein. Low
solubility can result from failure of the protein to fold
properly, aggregation of folded protein, or from unfav-
orable properties of the construct (intrinsic insolubility
of the native sequence or insolubility introduced by
a non-native sequence, such as a purification tag or
other cloning artifact). As indicated in TargetDB, the
target registration database for structural genomics
(http://targetdb.pdb.org/), the proportion of targets
that code for ‘unique proteins’ that yield soluble pro-
tein is only about one-third for prokaryotic proteins
and much lower foreukaryotic proteins. In this con-
text, a unique protein is defined as one with a peptide
sequence exhibiting £ 30% sequence identity to the
sequence of any protein with a three-dimensional
structure deposited in the Protein Data Bank. Solubil-
ity can be improved greatly by producing the protein
of interest as a cleavable fusion with a highly soluble
protein. This strategy may enable the protein to fold
properly without aggregation so that it stays in solu-
tion following cleavage. Many eukaryotic proteins are
‘natively disordered’, that is they do not adopt a sin-
gle, stable, folded structure. Some natively disordered
proteins require an additional factor for folding: a
metal ion, a small molecule cofactor, another peptide
chain, or an oligonucleotide. Other proteins may
require extensive post-translational modification to
achieve their native folded state. Platforms for struc-
tural investigations must support the production of
proteins on the scale of 2–10 mg. For efficient struc-
ture determination by NMR spectroscopy, the proteins
must be labeled with stable isotopes (
15
Nor
13
C+
15
N,
or for larger proteins
2
H+
13
C+
15
N). For X-ray crys-
tallography, proteins normally are labeled with sele-
nomethionine (Se-Met) to support multiwavelength
anomalous dispersion data collection for phase deter-
mination. Because proteinproduction and labeling on
this scale is expensive, it is important to screen targets
first on a smaller scale to identify which constructs are
expressed, soluble without aggregation, folded, and
stable under the conditions used for NMR structure
determinations or crystallization trials.
In vitro cell-free methods forprotein synthesis with
extracts from prokaryotic [1] or eukaryotic [2] cells
offer an alternative to the E. coli cell-based platforms.
Cell-free approaches have a number of potential
advantages over other alternatives to heterologous
expression in E. coli cells. Stable isotope or Se-Met
labeling is easier with cell-free systems than with yeast,
mammalian, or insect cell systems [3–5]. Cell-free sys-
tems may permit successful production of proteins that
undergo proteolysis [6,7] or accumulate in inclusion
bodies [8] in cells. Cell-free systems support selective
labeling strategies [9–12] that cannot be achieved in
bacterial whole cell systems. An important emerging
approach is the incorporation of stereo-array isotope
labeled (SAIL) amino acids [13], chemically synthes-
ized amino acids with stereo-specifically arrayed stable
isotope (
2
H and
13
C) labeling patterns that are optimal
for NMR spectroscopy. SAIL amino acids are being
commercialized by a start-up company in Japan (Sail
Technologies, Inc., Yokohama, Japan) and when avail-
able will raise the threshold for high-throughput NMR
structure determinations from 20 kDa to 40 kDa and
above [13]. The SAIL amino acids must be incorpor-
ated into proteins by in vitro synthesis so as not to dis-
turb the labeling pattern.
Cell-free systems have been used for the production
of various kinds of proteins, including membrane pro-
teins [14] and proteins that are toxic to cells [8,15]. It
is possible to collect NMR spectra of [
15
N]-labeled
proteins prior to isolation from the cell-free protein
synthesis mixture [16,17]. One of the features of cell-
free proteinproduction is that only the protein of
interest is labeled, so that contaminating proteins do
not show up in normal multinuclear NMR spectra.
Cell-free proteinproduction protocols are streamlined
compared to cell-based protocols, in that they do not
require cell harvesting or cell lysis. Protein purification
is usually simpler, because the protein of interest starts
out more concentrated and is isolated from a smaller
set of contaminants.
The RIKEN Structural Genomics Center in colla-
boration with Roche has pioneered the use of cell-free
protein production through a coupled transcription-
translation system employing E. coli extracts [18–22].
It has been found, however, that most of the pro-
teins that produce well in E. coli cell-free systems are
the same ones that are produced successfully from
E. coli cells [10]. Thus, despite other potential advan-
tages, the E. coli cell-free approach may not greatly
expand the range of proteins that can be produced in
soluble, folded state, although it may be possible to
overcome this limitation by redesigning the gene
sequence (see below), by adding chaperones or other
factors [22,24], or by reengineering ribosomal proteins
[25].
D. A. Vinarov et al. Wheatgermcell-freeeukaryoticprotein production
FEBS Journal 273 (2006) 4160–4169 ª 2006 The Authors Journal compilation ª 2006 FEBS 4161
One of the first in vitro translation systems to be
investigated was prepared from wheatgerm extracts,
but yields from this eukaryotic extract were low [2].
Y. Endo and his group at Ehime University (Matsuy-
ama, Japan) achieved a breakthrough in this technol-
ogy by finding that an inhibitor of ribosomal protein
synthesis, tritin, is associated with the coat of the
wheat embryo [26]. They developed a process for
removing this contaminating inhibitor and patented
this process along with methods for utilizing the
improved wheatgerm extract [26–31]. Endo founded
a company, CellFree Sciences Co., Ltd. (Yokohama,
Japan), to commercialize the technology. We found
this approach to be promising and formed a cooper-
ative undertaking with Ehime University and the
CellFree Sciences Co., Ltd. with the goal of investi-
gating the potential of wheatgermcell-free protein
production as an enabling technology in our struc-
tural genomics project, the Center for Eukaryotic
Structural Genomics (CESG; Madison, WI). As dis-
cussed here, we have found this technology to be
robust, and our wheatgermcell-free pipeline now
supports high-throughput screening forprotein pro-
duction and solubility and provides stable isotope
labeled protein samples for the majority of the NMR
structures determined at CESG [32,33].
CESG’s wheatgermcell-free platform
Our detailed protocol forwheatgermcell-free protein
production is available elsewhere [34]. In short, the
approach consists of four steps (Fig. 1A): (1) creation
of a plasmid used for in vitro transcription, (2) small
scale (25–50 lL) screening to assay the level of protein
production and solubility, (3) larger scale (4–12 mL)
production of [U-
15
N]-protein used to evaluate whe-
ther solution conditions can be found that render the
target suitable for NMR structure determination
(soluble, monodisperse, folded, and stable), and (4)
production of sufficient [U-
13
C,
15
N]-protein for multi-
dimensional, multinuclear magnetic resonance data
collection. We purchase the wheatgerm extract from
CellFree Sciences, Inc., the RNA polymerase from
Promega (Madison, WI), and the labeled amino acids
from Cambridge Isotope Laboratories, (Andover,
MA). Details about these and other reagents and sup-
plies are found in our publications [32–34].
The purification workflow diagram is shown in
Fig. 1B. In step (1), a defined series of cloning proce-
dures are used to create a DNA plasmid containing
the target gene and 5¢ and 3¢ extensions that promote
efficient transcription and translation. In step (2), small
scale protein expression and purification trials are car-
ried out, generally in a 96 well format. Successful can-
didates from these screens (those estimated to yield
> 0.5 mgÆmL
)1
target protein with solubility > 75%)
are then selected for larger scale protein production
with incorporation of [
15
N]-labeled amino acids. Puri-
fied [U-
15
N]-protein samples produced in step (3) are
then assayed by
1
H-
15
N correlation spectroscopy
(
1
H-
15
N HSQC) for their suitability as structural can-
didates (they must be folded, monodisperse, and stable
at room temperature for at least 14 days). The solution
conditions can be refined as part of this step. Targets
that pass these tests are then prepared as [U-
13
C,
15
N]-
protein samples, step (4).
1st GSTrap
Column
Concentrate
PreScission
Protease Cleavage
2nd GSTrap
Column
Protein product
with cleavable
N-GST tag
Protein product
with non-cleavable
N-(His)
6
tag
Ni-HiTrap
Chelating Column
Concentrate
Superdex75 in
NMR Buffer
Concentration
NMR sample
Cell Free
Reaction
(4-12ml)
Target selection
AB
Screening (50 µl scale)
Analysis, Expression level, Solubility, (Tag cleavage)
3. Production and analysis of [
15
N]-protein
DNA plasmid preps, Transcription
Translation on [
15
N]-amino acids (4 ml reaction)
Isolation, purification (tag removal) HSQC NMR analysis,
Solubility, stability, and MS analysis
4. Production of [
13
C,
15
N]-protein
As above but with double-labeling (4 –12 ml reaction)
Structure determination by NMR
Production for structural analysis (4-12 ml scale)
Production for structural analysis
2. Small scale – Transcription, Translation
1. Cloning – PCR from cDNA, Ligation cloning
DNA plasmid prep
Successful targets
Fig. 1. (A) Workflow diagram showing how
wheat germcell-freeplatform is used to
screen constructs for the expression of sol-
uble protein, to produce [
15
N]-labeled protein
for NMR screening for suitability as a struc-
tural candidate, and for the production of
double-labeled [
13
C,
15
N]protein for structure
determination. (B) Schematic illustration of
the steps involved in isolating and purifying
proteins produced by wheatgerm cell-free
platform depending on the type of tag: non-
cleavable N-(His)
6
tag or cleavable N-GST
tag.
Wheat germcell-freeeukaryoticproteinproduction D. A. Vinarov et al.
4162 FEBS Journal 273 (2006) 4160–4169 ª 2006 The Authors Journal compilation ª 2006 FEBS
We have tested the wheatgermcell-freeplatform in
the context of NMR-based structural genomics of
eukaryotic proteins and have compared it with our
parallel E. coli cell-based platform. Our experience is
summarized briefly as follows. (a) Targets can be
screened more quickly and more economically for pro-
tein expression and solubility by the cell-free approach
than by the cell-based approach. The efficiency of this
process is important, because we need to screen many
targets or multiple constructs of a given target in order
to find one that produces a protein that is soluble and
well folded. As an example of multiple screening of a
given target, we have screened targets with a noncleav-
able His
6
tag, with a cleavable His
6
tag, and with a
cleavable glutathione S-transferase (GST) tag and have
shown complementary success with these [35]. (b)
Because of the smaller volumes involved, the isolation
and purification of 1–5 mg quantities of labeled pro-
tein for NMR structural studies is faster and less labor
intensive with proteins prepared by the cell-free
approach than the cell-based approach. (c) Proteins
produced with the wheatgerm extract from CellFree
Sciences and labeled amino acids generally show high
levels of enrichment by mass spectrometry: > 95%
15
N ⁄ (
14
N+
15
N) or
13
C ⁄ (
12
C+
13
C). These high levels
are excellent for NMR spectroscopy. (d) The cell-free
system supports the production of proteins with a vari-
ety of labeling patterns: uniform labeling with
2
H,
13
C,
and
15
N, selective labeling by residue type, and SAIL
(discussed above).
We recently carried out a detailed comparison of the
wheat germcell-free and E. coli cell-based approaches
to proteinproductionfor NMR structure determin-
ation [35]. In this study 96 randomly chosen Arabidop-
sis thaliana targets were carried through CESG’s wheat
germ cell-free and E. coli cell pipelines. If possible,
[
15
N]-labeled versions of each protein were produced
for analysis by
1
H-
15
N correlation NMR spectroscopy.
Of the 96 targets started with, only eight from the cell-
free pipeline and five from the cell-based pipeline were
found suitable for NMR structural analysis on the
basis of the NMR results. In this comparison, the five
targets that proved successful by the E. coli cell-based
approach also were successful by the cell-free
approach.
Our wheatgermcell-free approach appears to have
advantages over published in vitro protein production
protocols that utilize E. coli S30 extract. (a) Cell-free
protocols utilizing E. coli extract usually call for the
testing of multiple plasmids with sequence differences
outside the protein coding region to determine one
that produces protein in high yield [36]. By contrast,
with the wheatgermcell-free protocol we have found
no advantage of modifying the plasmid sequence out-
side the coding region, and hence utilize a single plas-
mid construct for all targets. (b) Protocols for E. coli
S30 cell-free synthesis typically employ additives, such
as polyethylene glycol to improve protein yields [10].
These additives need to be removed prior to NMR
structural studies. No such additives are required with
the wheatgermcell-free approach probably because
the wheatgerm extract contains chaperones and other
factors that contribute to higher protein yields. (c) To
achieve a high level of label incorporation from E. coli
S30 extract it may be necessary to take pains to
remove endogenous unlabeled amino acids [10]. (d)
Proteins prepared from E. coli S30 extract may be het-
erogeneous as the result of incomplete cleavage of the
N-terminal methionine. This heterogeneity can lead to
doubling of NMR peaks [10]. An effective solution is
to make all proteins with a cleavable N-terminal
sequence. This complication does not occur with pro-
teins produced in vitro from wheatgerm extract. (e)
Wheat germ extracts contain chaperones, and do not
require the addition of chaperones as sometimes nee-
ded for high yields from E. coli S30 extract [37,38].
A comparison of proteinproduction from wheat germ
extract and E. coli S30 extract [39] demonstrated that
a significantly higher proportion of multiple domain
eukaryotic proteins were soluble when translated by
wheat germ extract.
Automation
All of the cell-free operations can be carried out by
hand, and this is how we started using the technology.
Because of the small volume requirements for screen-
ing (25–50 lL) and proteinproductionfor structural
studies (4–12 mL), cell-free methods have proved
amenable to automation. CESG makes use a CellFree
Sciences GeneDecoder1000
TM
robotic system (Fig. 2)
in automating the small scale screening of constructs
for proteinproduction and solubility. This unit makes
it possible to carry out as many as 1052 small scale
(25 lL) screening reactions per week. CESG has
two prototype robotic units developed by CellFree
Sciences for larger scale proteinproduction (Fig. 2).
The Protemist10
TM
robotic system requires preparation
of the mRNA off-line, whereas the newer Prote-
mist100
TM
starts with DNA and produces the mRNA
transcript prior to the translation step. Each of these
systems supports 24 4 mL transcription and translation
reactions per week. Typical yields for the Protemist
runs are 0.3–0.5 mg purified protein per mL reaction
mixture. These robotic systems handle the many steps
that are tedious to carry out by hand, and work
D. A. Vinarov et al. Wheatgermcell-freeeukaryoticprotein production
FEBS Journal 273 (2006) 4160–4169 ª 2006 The Authors Journal compilation ª 2006 FEBS 4163
through the night. They have greatly reduced the man-
power requirements of cell-free screening and protein
production.
Success rates with eukaryotic targets
The centers involved in the NIH Protein Structure Ini-
tiative (USA) are generating information about success
rates in going from a selected target gene to a comple-
ted and deposited three-dimensional protein structure.
The overall success rates still tend to be quite low, in
the range of 2% to 20%, depending on the center and
the types of targets selected. It is clear from all centers
that the yields of structures foreukaryotic targets are
much lower than for prokaryotic targets. In the inter-
est of efficiency and cost savings, it is important to
analyze where failures occur and to devise strategies to
minimize these. The most effective routes for improve-
ment involve a combination of bioinformatics and
small scale screening. Bioinformatics relies on prior
information and mathematical models for correlating
success rates with gene sequences. Small scale screening
offers the most economical way of testing whether a
cloned and sequenced target will proceed through the
critical stages leading to a structure. The initial screen-
ing step determines the level of gene expression and
the solubility of the product. As described above,
CESG’s wheatgermcell-freeplatform supports rapid
and economical small scale screening for expression
and solubility. We currently test constructs with and
without an N-terminal tag and have shown success in
rescuing failed targets by truncating the N- and ⁄ or
C-termini. The second screening operation relevant to
NMR structure determinations is the screening of the
[
15
N]-labeled protein target by
1
H-
15
N HSQC spectro-
scopy). This test, which is repeated after one week to
Fig. 2. Fully automated protein synthesizers from CellFree Sciences. (Left) GeneDecoder1000
TM
, which operates in two small scale modes.
In the screening mode, it handles up to four 96 well plates per overnight run, produces 2–10 lg protein per well, and uses 1.0–5.0 mL
wheat germ extract per plate. In the small scale proteinproduction mode it can handle up to two 96 well plates per overnight run, produces
between 10 and 50 lg protein per well, and uses 5.0–10.0 mL wheatgerm extract per plate. (Center) Protemist10
TM
robotic system, which
is capable of carrying out 24 4 mL translation reactions per week. The unit produces 1–3 mg protein per reaction and utilizes 3 mL wheat
germ extract per reaction. This system requires off-line preparation of the mRNA. (Right) Protemist100
TM
robotic system, which supports
24 4 mL transcription and translation reactions per week. Its capabilities are similar to those of the Protemist10, but it has the added feature
of automated production of mRNA. These robotics systems carry out a variety of operations including solvent extraction, high level multi-
channel liquid handling, centrifugation, and incubation at various temperatures. An onboard microprocessor interfaced with the computer
connected to the database keeps detailed log files that contain information about temperatures, volumes, and operational performance at
every step.
Wheat germcell-freeeukaryoticproteinproduction D. A. Vinarov et al.
4164 FEBS Journal 273 (2006) 4160–4169 ª 2006 The Authors Journal compilation ª 2006 FEBS
determine if the protein is stable in solution, is highly
diagnostic for the success of an NMR structure deter-
mination. Proteins that pass this test are then pro-
duced with [
15
N+
13
C]-labeling.
We have accumulated experience in using the
cell-free platform to produce proteins from several
eukaryotic genomes. These include over 722 different
structural genomics targets from human, mouse, and
Arabidopsis (Table 1). Most of the targets selected for
testing have coded for proteins less then 25 kDa,
because this is the size limit for high-throughput
structure determinations by NMR spectroscopy. In
addition, we have carried out small scale wheat germ
cell-free screening of approximately 150 larger proteins
(25–70 kDa), and the success rates for expressing sol-
uble proteins appear to be comparable to our earlier
results with smaller targets presented in Table 1.
We define ‘highly soluble’ as ‡ 75% of the total pro-
tein being present in the soluble fraction. Of the same
proteins produced with N-terminal GST tags and
N-terminal (His)
6
tags, 9% more were highly soluble
with the GST tag. Only 5% of proteins soluble as
GST fusions became insoluble following cleavage and
removal of the GST tag. Thus the results show that
proteins fused to GST can be more highly soluble and
that the advantage may persist after the tag is removed
(presumably through improved folding of the purified
fusion protein prior to cleavage).
We have gathered statistics specific to human pro-
teins. Of 174 human targets (most with unknown func-
tion) that were successfully cloned, 135 (78%) showed
expression at levels suitable for structural investiga-
tions. Of these expressed proteins, 55 (41%) were
soluble at levels needed for NMR spectroscopy. Of
these, 36 (66%) gave [
15
N]-labeled samples at levels
that could be evaluated by NMR spectroscopy. To
date, nine of these human proteins yielded NMR
structures. In total, CESG has determined NMR struc-
tures of 18 eukaryotic proteins produced by this meth-
odology (Fig. 3). The average yield of purified, labeled,
human proteins made for NMR structural studies has
been 0.3 mgÆmL
)1
reaction mixture.
Costs
Labor savings, coupled with the high level of incorpor-
ation of labeled amino acids and the high yield of folded
protein samples, makes the overall cost of the wheat
germ cell-free method comparable to that of the E. coli
cell-based approach for NMR structure determinations
of eukaryotic proteins. One of the main advantages of
the automated wheatgermcell-freeprotein expression
system is that the overall process requires much less
time and effort compared to our current cell-based
methods. Not including the cloning steps, it generally
takes 48 h (using the GeneDecoder1000
TM
), or 72 h
(manually), to screen 96 targets for expression and solu-
bility on the small scale. The purification protocols also
require less time and effort than cell-based protocols
because of the smaller volumes (4–12 mL versus 500–
1000 mL) and higher initial purity. Using the latest in
General Electric Healthcare HIS-TRAP purification
technology (Piscataway, NJ), immobilized metal affinity
chromatography (IMAC) purification of His tagged
proteins requires 40 min of processing time and results
in protein samples that are 75–85% pure. Gel filtration
adds an additional 3 h and can increase the purity to
> 95% for proteins < 15 kDa and to 90% for proteins
< 20 kDa. GST purification results in > 95% purity
regardless of size; however, the minimal time to process
the sample is greater than 10 h.
Because stable isotope labeled amino acids required
for NMR structure determinations are expensive, it is
important that the protein yield per quantity of amino
Table 1. Statistics on eukaryotic proteins produced by CESG’s wheatgermcell-free platform.
Small scale (lg), automated 96 well format production overnight Large scale (mg), automated 8 · 4 mL production overnight
Genome
Targets
selected
Targets
cloned
successfully
Targets
showing
acceptable
expression
Targets
showing
adequate
solubility
[
15
N]-labeled
proteins
produced
Acceptable
[
15
N-
1
H]-HSQC
spectrum
Protein
stable
for >
10 days
[
13
C,
15
N]-
labeled
protein
made
a
3D structures
by NMR
Human 191 174 (91%)
b
135 (77%) 55 (41%) 36 (66%) 15 (42%) 10 (67%) 10 (100%) 9 (90%)
Mouse 150 129 (86%) 47 (36%) 14 (30%) 11 (79%) 2 (18%) 1 (50%) 1 (100%) 1 (100%)
Arabidopsis 381 351 (92%) 269 (77%) 120 (45%) 76 (63%) 17 (22%) 9 (53%) 9 (100%) 8 (89%)
Total 722 654 (91%) 451 (69%) 189 (42%) 123 (65%) 34 (28%) 20 (59%) 20 (100%) 18 (90%)
a
Average yield of purified double-labeled proteins used in structural investigations was 0.3 mgÆmL
–1
reaction mixture.
b
Percentages represent
the number of successful targets at a given step divided by the number coming from the previous step (174 ⁄ 191) ¼ 91% in the case indica-
ted.
D. A. Vinarov et al. Wheatgermcell-freeeukaryoticprotein production
FEBS Journal 273 (2006) 4160–4169 ª 2006 The Authors Journal compilation ª 2006 FEBS 4165
(5) At3g01050.1
13 kDa
PDB: 1SE9
(6) At2g24940.1
11 kDa
PDB: 1T0G
(4) Dr.13312
12 kDa
PDB: 2FB7
(7) At3g51030.1
14 kDa
PDB: 1XFL
(8) Hs.102419
13 kDa
PDB: 1ZR9
(9) Hs.157607
14 kDa
PDB: 2ETT
(10) At2g23090.1
9 kDa
PDB: 1WVK
(11) P62627
dimer 22 kDa
PDB: 1Y4O
(12) At2g46140.1
19 kDa
PDB: 1YYC
(1) Hs.78877
11 kDa
PDB: 2G2B
(2) At5g39720.1
19 kDa
PDB: 2G0Q
(3) At5g66040.1
14 kDa
PDB: 1TQ1
Fig. 3. Examples of three-dimensional solution structures of eukaryotic proteins determined by NMR spectroscopy from labeled samples
produced by the wheatgermcell-freeplatform described here. All structures have been deposited in the Protein Data Bank under the acces-
sion codes indicated. The molecular masses of the proteins are indicated; these proteins are relatively small because they were chosen as
targets for high-throughput NMR structure determination, which currently has a practical size limit of 25 kDa. (1) Hs.78877 is human allo-
graft inflammatory factor 1. (2) At5g39720.1 is a protein of unknown function from A. thaliana. (3) At5g66040.1 is a single domain sulfur-
transferase and is annotated as a senescence-associated protein (sen1-like protein) and ketoconazole resistance protein. (4) Dr13312 is a
protein of unknown function from zebrafish. (5) At3g01050.1 from A. thaliana has a ubiquitin-like fold, and may be prenylated at a putative
C-terminal CAAX box motif so as to target the protein and its binding partners to a membrane compartment of the cell [32]. (6) At2g24940.1
from A. thaliana gave a structure with a cytochrome b5-like fold but with some resemblance to steroid binding proteins [42]; a subsequent
NMR study showed that the protein binds progesterone. This protein failed to express in the E. coli cell-based pipeline. (7) At3g51030.1 is
an h1 thioreodoxin from A. thaliana [43]. This protein was also produced from E. coli cells; it gave an acceptable HSQC spectrum but failed
to crystallize. (8) Hs.102419 is a human C2h2-type zinc finger protein. (9) Hs.157607 is a human sorting nexin 22 px domain. (10)
At2g23090.1 is an unknown, partially disordered protein from A. thaliana. (11) P62627 from mouse is isoform 1 of Roadblock ⁄ LC7, a light
chain in the dynein complex [44]. (12) At2g46140.1 from A. thaliana is late embryogenesis abundant (LEA) protein of a type expressed under
conditions of cellular stress, such as desiccation, cold, osmotic stress, and heat [45].
Wheat germcell-freeeukaryoticproteinproduction D. A. Vinarov et al.
4166 FEBS Journal 273 (2006) 4160–4169 ª 2006 The Authors Journal compilation ª 2006 FEBS
acid supplied be high. With cell-free systems (E. coli or
wheat germ) 10% of the labeled amino mixture sup-
plied is incorporated into the protein produced and
purified.
Although the cell-free approach is much less labor
intensive in comparison to our E. coli cell-based pipe-
line, it requires more expensive reagents and supplies.
Current limitations of the method stem from the
restricted availability and high cost of highly active
wheat germ extract. These problems should ease as the
wheat germcell-free approach becomes more wide-
spread and as increasing demands forcell-free extract
stimulate improvements in production technology. The
costs of stable isotope labeled amino acids also may be
expected to decrease as demand accelerates. Average
supplies costs currently are: US$47 per target for clo-
ning and expression solubility testing (with unpurified
reaction mixture assayed by SDS⁄ PAGE), US$370 per
mg for Se-Met protein, US$390 per mg for [
15
N]pro-
tein, and US$470 per mg for [
13
C,
15
N]protein (with
proteins isolated and purified).
The major advantages of the wheatgerm cell-free
method over the E. coli cell-based pipeline are that it
supports the production of a larger fraction of targets
as folded, soluble protein and that it is much faster to
prepare additional samples or truncated samples as
needed for successful structure determinations. The
E. coli approach has a cost advantage when its protein
yields are much higher than cell-free. The overall costs
of each approach appear to be similar for NMR struc-
ture determinations.
Prospects
Because of the complementarity of cell-free and cell-
based methods, we envision that it will be most
efficient to screen each new target by both methods.
Initially, we did not have an easy way to screen tar-
gets by the two approaches, because the cell-based
pipeline was using ligation-independent cloning tech-
nology, whereas the cell-free pipeline used ligation clo-
ning into the pEU vector. To remedy this, we recently
implemented a cloning strategy that enables efficient
small scale screening by cell-free and cell-based meth-
ods [40]; this approach utilizes Promega’s FlexiÒ
Vector technology to transfer the target gene from
one plasmid to another. By comparing the small scale
screening results from the two platforms, we can now
choose the one more likely to be successful. If the
cell-based approach is selected for an NMR target,
we make use of a self-induction medium developed
for producing [
15
N] or [
13
C+
15
N]-labeled protein
from E. coli cells [41].
The largest remaining bottlenecks associated with
the wheatgermcell-free protocol are the limited solu-
bility, aggregation, or limited stability exhibited by
many targets. Improvements in any of these areas
would greatly lower the costs of structure determina-
tions. Our ongoing research is aimed at investigating
reasons for failures of these types and at developing
approaches for rescuing failed targets. Some structural
genomics centers start multiple constructs for each tar-
get selected (different N- and C-termini, different
fusions, or different vectors and hosts) and choose the
one that yields the most soluble protein. We have initi-
ated a pilot study aimed at determining whether the
initial production of constructs with multiple N- and
C-termini for small scale screening would be more effi-
cient than our current approach of redesigning failed
constructs.
Currently, CESG’s X-ray structure pipeline requires
in the order of 10 mg of Se-Met proteinfor each tar-
get. We anticipate that as reliable small scale crystal-
lization screening methods become available, the wheat
germ cell-free method could become part of the X-ray
crystallography pipeline. We have already determined
by mass spectrometry that the wheatgerm cell-free
approach supports high level incorporation of Se-Met,
and we have made small quantities of Se-Met-labeled
proteins for use chip (Fluidigm, South San Francisco,
CA) crystallization screening.
Acknowledgements
We gratefully acknowledge the work of all CESG staff
members and collaborators and fruitful interactions
with Professor Y. Endo and his group at Ehime Uni-
versity, Matsuyama, Japan, and staff members of
CellFree Sciences Co., Ltd. (Yokohama, Japan)
in adapting their technology to research and product-
ion environments. Supported by NIH grants 1U54
G074901 (which supports CESG), and P41 RR02301
(which supports the National Magnetic Resonance
Facility at Madison, where NMR spectroscopy was
carried out).
References
1 Kramer G, Kudlicki W, Hardesty B, Higgens SJ &
Hames BD (1999) Cell-free coupled transcription-trans-
lation systems from Escherichia coli.InProtein
Expression. A Practical Approach (Higgens SJ & Hames
BD, eds), pp. 201–223. Oxford University Press, Oxford,
UK.
2 Clemens MM, Prujin GJ, Higgens SJ & Hames BD
(1999) Protein synthesis in eukaryoticcell-free systems.
D. A. Vinarov et al. Wheatgermcell-freeeukaryoticprotein production
FEBS Journal 273 (2006) 4160–4169 ª 2006 The Authors Journal compilation ª 2006 FEBS 4167
In Protein Expression. A Practical Approach (Higgens
SJ & Hames BD, eds), pp. 129–165. Oxford University
Press, Oxford, UK.
3 Cubeddu L, Moss CX, Swarbrick JD, Gooley AA,
Williams KL, Curmi PM, Slade MB & Mabbutt BC
(2000) Dictyostelium discoideum as expression host:
isotopic labeling of a recombinant glycoprotein for
NMR studies. Protein Expr Purif 19, 335–342.
4 Strauss A, Bitsch F, Cutting B, Fendrich G, Graff P,
Liebetanz J, Zurini M & Jahnke W (2003) Amino-acid-
type selective isotope labeling of proteins expressed in
baculovirus-infected insect cells useful for NMR studies.
J Biomol NMR 26, 367–372.
5 Bruggert M, Rehm T, Shanker S, Georgescu J & Holak
TA (2003) A novel medium for expression of proteins
selectively labeled with
15
N-amino acids in Spodoptera
frugiperda (Sf9) insect cells. J Biomol NMR 25, 335–348.
6 Goff SA & Goldberg AL (1987) An Increased Content
of Protease LA, the Lon Gene-Product, Increases Pro-
tein-Degradation and Blocks Growth in Escherichia coli.
J Biol Chem 262, 4508–4515.
7 Maurizi MR (1987) Degradation in vitro of bacterio-
phage lambda N protein by Lon protease from Escheri-
chia coli. J Biol Chem 262, 2696–2703.
8 Chrunyk BA, Evans J, Lillquist J, Young P & Wetzel R
(1993) Inclusion-Body Formation and Protein Stability
in Sequence Variants of Interleukin-1-Beta. J Biol Chem
268, 18053–18061.
9 Shi J, Pelton JG, Cho HS & Wemmer DE (2004) Pro-
tein signal assignments using specific labeling and cell-
free synthesis. J Biomol NMR 28, 235–247.
10 Torizawa T, Shimizu M, Taoka M, Miyano H &
Kainosho M (2004) Efficient production of isotopically
labeled proteins by cell-free synthesis: a practical proto-
col. J Biomol NMR 30, 311–325.
11 Yabuki T, Kigawa T, Dohmae N, Takio K, Terada T,
Ito Y, Laue ED, Cooper JA, Kainosho M & Yokoyama
S (1998) Dual Amino Acid-Selective and Site-Directed
Stable-Isotope Labeling of the Human c-Ha-Ras Protein
by Cell-Free Synthesis. J Biomol NMR 11, 295–306.
12 Kigawa T, Muto Y & Yokoyama S (1995) Cell-Free
Synthesis and Amino Acid-Selective Stable Isotope
Labeling of Proteins for NMR Analysis. J Biomol NMR
6, 129–134.
13 Kainosho M, Torizawa T, Iwashita Y, Terauchi T, Mei
Ono A & Gu
¨
ntert P (2006) Optimal isotope labelling
for NMR protein structure determinations. Nature 440,
52–57.
14 Klammt C, Lohr F, Schafer B, Haase W, Doetsch V,
Ru
¨
terjans H, Glaubitz C & Bernhard F (2004) High
level cell-free expression and specific labeling of integral
membrane proteins. Eur J Biochem 271, 568–580.
15 Henrich B, Lubitz W & Plapp R (1982) Lysis of Escher-
ichia coli by induction of Cloned Phi-X174 Genes. Mol
Gen Gen 185, 493–497.
16 Guignard L, Ozawa K, Pursglove SE, Otting G &
Dixon NE (2002) NMR analysis of in vitro-synthesized
proteins without purification: a high-throughput
approach. FEBS Lett 524, 159–162.
17 Kohno T (2005) Production of proteins for NMR stud-
ies using the wheatgermcell-free system. Methods Mol
Biol 310, 169–185.
18 Kigawa T & Yokoyama S (2002) [High-throughput cell-
free protein expression system for structural genomics
and proteomics studies]. Tanpakushitsu Kakusan Koso
47, 1014–1019.
19 Yokoyama S (2003) Protein expression systems for
structural genomics and proteomics. Curr Opin Chem
Biol 7, 39–43.
20 Kigawa T, Yabuki T, Yoshida Y, Tsutsui M, Ito Y,
Shibata T & Yokoyama S (1999) Cell-free production
and stable-isotope labeling of milligram quantities of
proteins. FEBS Lett 442, 15–19.
21 Kim DM, Kigawa T, Choi CY & Yokoyama S (1996)
A Highly Efficient Cell-FreeProtein Synthesis System
from Escherichia coli. Eur J Biochem 239, 881–886.
22 Yokoyama S, Hirota H, Kigawa T, Yabuki T, Shirouzu
M, Terada T, Ito Y, Matsuo Y, Kuroda Y, Nishimura
Y, Kyogoku Y, Miki K, Masui R & Kuramitsu S
(2000) Structural genomics projects in Japan. Nat Struct
Biol 7 (Suppl.), 943–945.
23 Kim DM & Swartz JR (2000) Prolonging cell-free pro-
tein synthesis by selective reagent additions. Biotechnol
Prog 16, 385–390.
24 Yin G & Swartz JR (2004) Enhancing multiple disulfide
bonded protein folding in a cell-free system. Biotechnol
Bioeng 86, 188–195.
25 Chumpolkulwong N, Hori-Takemoto C, Hosaka T,
Inaoka T, Kigawa T, Shirouzu M, Ochi K &
Yokoyama S (2004) Effects of Escherichia coli riboso-
mal protein S12 mutations on cell-freeprotein synthesis.
Eur J Biochem 271, 1127–1134.
26 Madin K, Sawasaki T, Ogasawara T & Endo Y (2000)
A highly efficient and robust cell-freeprotein synthesis
system prepared from wheat embryos: Plants apparently
contain a suicide system directed at ribosomes. Proc
Natl Acad Sci USA 97, 559–564.
27 Endo Y (2001) Genomics to Proteomics: A High-
throughput Cell-freeProtein Synthesis System for Prac-
tical Use. The 3rd ORCS International Symposium on
Ribosome Engineering, January 22–23, 2001. Tsukuba,
Japan.
28 Kawasaki T, Gouda MD, Sawasaki T, Takai K & Endo
Y (2003) Efficient synthesis of a disulfide-containing
protein through a batch cell-free system from wheat
germ. Eur J Biochem 270, 4780–4786.
29 Morita EH, Sawasaki T, Tanaka R, Endo Y & Kohno
T (2003) A wheatgermcell-free system is a novel way
to screen protein folding and function. Protein Sci 12,
1216–1221.
Wheat germcell-freeeukaryoticproteinproduction D. A. Vinarov et al.
4168 FEBS Journal 273 (2006) 4160–4169 ª 2006 The Authors Journal compilation ª 2006 FEBS
30 Sawasaki T, Ogasawara T, Morishita R & Endo Y
(2002) A cell-freeprotein synthesis system for high-
throughput proteomics. Proc Natl Acad Sci USA 99,
14652–14657.
31 Sawasaki T, Hasegawa Y, Tsuchimochi M, Kamura
N, Ogasawara T, Kuroita T & Endo Y (2002) A
bilayer cell-freeprotein synthesis system for high-
throughput screening of gene products. FEBS Lett
514, 102–105.
32 Vinarov DA, Lytle BL, Peterson FC, Tyler EM,
Volkman BF & Markley JL (2004) Cell-free protein
production and labeling protocol for NMR-based
structural proteomics. Nat Methods 1, 149–153.
33 Vinarov DA & Markley JL (2005) High-Throughput
Automated Platformfor NMR-Based Structural Proteo-
mics. Expert Rev Proteomics 2, 49–55.
34 Vinarov DA, Tyler EM, Loushin Newman CL, Shahan
MN & Markley JL (2006) ProteinProduction using the
Wheat GermCell-Free Expression System. In Current
Protocols in Protein Science (Coligan JE, Dunn BM,
Ploegh HL, Speicher DW & Wingfield PT, eds. Series
ed. Taylor G), pp. 5.18.1–5.18.18. Unlimited Learning
Resources, Winston-Salem, NC.
35 Tyler RC, Aceti DJ, Bingman CA, Cornilescu CC, Fox
BG, Frederick RO, Jeon WB, Lee MS, Newman CS,
Peterson FC, Phillips GN Jr, Shahan MN, Singh S,
Song J, Sreenath H, Tyler EM, Ulrich EL, Vinarov
DA, Vojtik FC, Volkman BF, Wrobel RL, Zhao Q &
Markley JL (2005) Comparison of cell-based and cell-
free protocols for producing target proteins from the
Arabidopsis thaliana genome for structural studies.
Proteins 59, 633–643.
36 Betton JM (2003) Rapid translation system (RTS): a
promising alternative for recombinant protein produc-
tion. Curr Protein Pept Sci 4, 73–80.
37 Ryabov LA, Desplancq D, Spirin AS & Pluckthun A
(1997) Functional antibody production using cell-free
translation: effects of protein disulfide isomerase and
chaperones. Nat Biotechnol 15, 79–84.
38 Kang SH, Kim DM, Kim HJ, Jun SY, Lee. KY & Kim
HJ (2005) Cell-freeproduction of aggregation-prone
proteins in soluble and active forms. Biotechnol Prog 21,
1412–1419.
39 Hirano N, Sawasaki T, Tozawa Y, Endo Y & Takai K
(2006) Tolerance for random recombination of domains
in prokaryotic and eukaryotic translation systems: Lim-
ited interdomain misfolding in a eukaryotic translation
system. Proteins 64, 343–354.
40 Blommel PG, Martin PA, Wrobel RL, Steffen E & Fox
BG (2006) High-efficiency single-step production of
expression plasmids from cDNA clones using the Flexi
Vector cloning system. Protein Expr Purif 47, 562–570.
41 Tyler RC, Sreenath H, Aceti DJ, Bingman CA, Singh S,
Markley JL & Fox BG (2005) Auto-Induction Medium
for the Production of [U-
15
N]- and [U-
13
C, U-
15
N]-labe-
led Proteins for NMR Screening and Structure Deter-
mination. Protein Expr Purif 40, 268–278.
42 Song J, Vinarov D, Tyler EM, Shahan MN, Tyler RC
& Markley JL (2004) Hypothetical protein At2g24940.1
from Arabidopsis thaliana has a cytochrome b5 like fold.
J Biomol NMR 30, 215–218.
43 Peterson FC, Lytle BL, Sampath S, Vinarov D, Tyler
E, Shahan M, Markley JL & Volkman BF (2005) Solu-
tion structure of thioredoxin h1 from Arabidopsis thali-
ana. Protein Sci 14, 2195–2200.
44 Song J, Tyler RC, Lee MS, Tyler EM & Markley JL
(2005) Solution structure of isoform 1 of Road-
block ⁄ LC7, a light chain in the dynein complex. J Mol
Biol 354, 1043–1051.
45 Singh S, Cornilescu CC, Tyler RC, Cornilescu G,
Tonelli M, Lee MS & Markley JL (2005) Solution
structure of a late embryogenesis abundant protein
(LEA14) from Arabidopsis thaliana, a cellular stress-
related protein. Protein Sci 14, 2601–2609.
D. A. Vinarov et al. Wheatgermcell-freeeukaryoticprotein production
FEBS Journal 273 (2006) 4160–4169 ª 2006 The Authors Journal compilation ª 2006 FEBS 4169
. MINIREVIEW
Wheat germ cell-free platform for eukaryotic protein
production
Dmitriy A. Vinarov, Carrie L. Loushin Newman and John L. Markley
Center for Eukaryotic. wheat germ cell-free system is a novel way
to screen protein folding and function. Protein Sci 12,
1216–1221.
Wheat germ cell-free eukaryotic protein production