2.18 The weak A,B mean and C,D variance errors of species A,C A and B,D C of the crystallization example, using the Direct Hybrid method, the HyJCMSS method, and the HyJCMSS method with
Trang 1Simulation of Stochastic Chemical Systems: Applications in the Design and Construction of
Synthetic Gene Networks
A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF
THE UNIVERSITY OF MINNESOTA
BY
Howard Michael Salis
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
UNDER THE GUIDANCE OF
Yiannis Kaznessis
Month/Year of Degree Clearance: February 2007.
Trang 2UMI Number: 3244458
Copyright 2006 by Salis, Howard Michael
All rights reserved
INFORMATION TO USERS
The quality of this reproduction is dependent upon the quality of the copy submitted Broken or indistinct print, colored or poor quality illustrations and photographs, print bleed-through, substandard margins, and improper alignment can adversely affect reproduction
In the unlikely event that the author did not send a complete manuscript and there are missing pages, these will be noted Also, if unauthorized copyright material had to be removed, a note will indicate the deletion
®
UMI
UMI Microform 3244458 Copyright 2007 by ProQuest Information and Learning Company
All rights reserved This microform edition is protected against unauthorized copying under Title 17, United States Code
ProQuest Information and Learning Company
300 North Zeeb Road P.O Box 1346 Ann Arbor, MI 48106-1346
Trang 3This dissertation is copyrighted material All rights reserved (Howard Salis 2006-2007
Trang 4Acknowledgements
My doctoral research would not have been possible except for the continued help and support
of many dear people Writing a dissertation can be an especially isolating task and | am grateful to those who made it an enjoyable experience
| would like to thank my advisor, Yiannis Kaznessis, for his generous time and commitment Throughout my doctoral work, he encouraged me to develop independent thinking and research skills and allowed me to explore new areas of mathematics with no guarantee of success He continually stimulated new thoughts and greatly assisted me with scientific writing
| also extend my thanks to my fellow graduate students and friends in the Kaznessis research group for helping me in many ways with my graduate studies Jonathan Tomshine, Vassily Sotiropou- los, and John Barrett each greatly assisted me in advancing the rational design of synthetic gene networks In addition, | am also grateful to Himanshu Khandelia, Spyros Vicatos, Allison Langham, Abdallah Sayyed, Chandrika Mulakala, and Dan Bolintineau for both scholarly and not-so-scholarly discussions on a wide variety of topics, for sharing tea times and cookies, for baking (thanks Allison!) and eating birthday cakes, and for making graduate school more than just a list of published papers
| will miss our time together
| would like to thank Jennifer Maynard for allowing me to use her laboratory and equipment, and for teaching me a variety of genetic engineering techniques with her usual zeal and happiness | also express my gratitude to Benjamin Roy, Ryan Myhre, Kavita Ramalingam, and Rakesh Motani for their patience with my many questions while working in their lab
| also thank David Morse, Hans Othmer, Marie Contou-Carrere, Chetan Gadgil, and Chang Hyeong Lee for intellectual stimulation, discussions, and thoughtful suggestions on manuscripts
| would also like to extend my sincerest gratitude to Prabhas Moghe and Jane Tjia who mentored
my research studies while at Rutgers University They gave an inexperienced freshman a real job in their lab | will always be grateful for their patience, their training, and their encouragement | would also like to thank Troy Shinbrot and Stephen Conway at Rutgers for continuing that encouragement
Of course, | have had many teachers over the years and | would not be where | am without them
| thank them all, but would like to especially thank Mr Holmquist, who instilled a love of biology in
me, and to Mrs D’Esposito who taught me that “It’s not ‘I Know’, it’s ‘| Do’ ”
| have been blessed with some great friends who have always been there to share in difficult and joyous occasions, to lend an ear, or to simply study the sometimes hilarious behavior of that well-known vociferous species of Leporidae Brachylagus To C.M, Rick, Andy, Jane, Kristen, and Alan, | thank you all for your friendship
| also extend my heart-felt thanks to Alexis who has brightened my life and given me great joy This dissertation would not have been possible without your daily encouragement
Finally, | thank my parents for always loving me and being there to help solve problems big and small They are my greatest teachers in life
Funding for this research was provided by the United States National Institute of Health’s biotech- nology training grant (GM08347), the United States National Science Foundation (BES-0425882), and the Army High Performance Research Computing Center (AHPCRC)} of the US Army Research Lab (Contract DAAD10-01-2-0014) Computational support was also provided by the University of Minnesota’s Digital Technology Center, the NSF Funded TeraGrid, and the National Center for Su- percomputing Applications (TG-MCA04N033)
Trang 5This dissertation is dedicated to my parents, Jan and Barry, who lovingly brought me into this world and taught me
il
Trang 6An Overview of our Methodology 1 0 c vu v2 1v và và và
A Brief Introduction to the Mathematical Results .00
A Brief Introduction to Probabiliry and Stochastic Processes
The Numerical Simulation of Jump Markov and Poisson Processes
Trang 7CONTENTS IV
PC X92 on n ố nu 46
2.4 The Numerical Solution of Ité Stochastic Differential Equations 46
2.4.1 Definitions and Formal Solutions .000 47 2.4.2 Explicit Solutions of Some Stochastic Differential Equations 48
2.4.3 Strong and Weak Solutions 2 ee ee 49 2.4.4 lô and Stratonovich Stochastic Inegrals 50
2.4.5 The It6 Formula and ltô-Taylor Expansions 52
2.4.6 Numerical Generation of Stochastic Inteprals 34
2.4.7 Itô-TaylorExpHct NumericalSchemes 56
2.4.8 Implicit Stochastic NumericalSchemes 61
2.4.9 Adaptive Time StepSchemes 0.000.000 0% 62 2.5 HyJCMSS: The Hybrid Jump/Continuous Markov Stochastic Simulator 63
2.5.1 IHTfOdUCHON Q0 ee 64 "h1 ) n Ma Ẽ_ẽẼẼ 66 2.5.3 Algortihms c c c c c ch ng ng ng kg vi kg kg va 71 2.5.4 Examples, Error Analysis, and Critical Comparisons 74
2.55 Discussion 2 ——— “ 83
2.6 An Equation-Free Probabilistic Steady State Approximation 85
2.6.1 InroducHon ee 85 2.6.2 Anlllustrative Example 2 0.2 000000000200 ee 87 2.6.3 Theory 2 0 ee 89 2.6.4 Numerical Implementation 0.0.0.0 ee ee ee 96 2.6.5 AccuracyandSpeed 2 ee 99 2.6.66 Discussion 2 110
2.6.7 Conclusion ee 115 2.7 Hy3S: Hybrid Stochastic Simulation for Supercomputes 116
2.7.1 InroducHon ee 116 2.72 Software Implementation 2 0 00.00 eee eee 117 2.7.3 Solution of a Hybrid Jump/Continuous Markov Process 121
2.7.4 The Fixed Euler MaruyamaMethod Ặ 123 2.7.5 The Fixed Milstein Method .0 0 2.0.2 0000 123 2.7.6 Adaptive Methods 0 0 0 Q vn nu 1v 2v và xa 123 2.7.7 The Graphical User Interface 2 ee ee 125 2.7.8 ResuHsandExamples Q Q k Q nu 12v xv 129 "` › ` ` Ma Ma 136 2.7.10 Conclusion e “Ha ÈŠẽẽHqđ 141 3 Design of Synthetic Gene Networks 142 3.l InHOdUCHION Q Q Q Q Q Q nu ng g k k k v k v v.v kia 142 3.1.1 AnOwverview of the ChapfET cu ee ee 143 3.2_ An Overview of Regulated Bacterial Gene Expression 144
3.2.1 Transcription 2 145
3.2.2 Translation 2 ee 146 3.2.3 The Regulation of Transcriptional Interactions .0 149
Trang 83.3.3 The Chemical Partition Function and Equilibrium Holoenzyme Formation 164
Construction, Characterization, and Mathematical Analysis of a Synthetic Promoter 210
4.3.2 The Steady-State Distribution of GFP Fluorescence over Varying Inducer
A Steady-State Mathematical Model of the Synthetic Promoter 221
Trang 9New Numerical Methods for Stochastic Bifurcation Analysis
5.3.1 Approximation the Action of the Forward and Reverse Time Cocycle
Forward and Reverse Master Equations 2 0.0.0.2 eee
Forward and Reverse Stochastic Simulation .00.0
Trang 10List of Tables
21
2.2
2.3
2.4
2.5
2.6
2.7
2.8
2.9
2.10
2.11
2.12
2.13
2.14
2.15
2.16
2.17
2.18
2.19
3.1
Four important characteristics of five useful probability distribution functions
A list of the mass action rate laws for stochastic chemical kinetics N4: Avagadro’s number V: Volume Molec: Molecules Q Q Q LH vu vu V2 k KY V Và Cvcle Test reactions and paframefffS ee Ratios of Computational Run Times of Cycle Tests 2 .0.04 A Simplified Model of the Pulse Generating Gene Network in Drosophila Circadian Rhythm 2 ằŠằ a.aa la aAa<% a.aTẶ The crystallization reaction system and kinetic parameters
The effect of time step on the run time and number of SDE integration steps of three hybrid stochastic methods 2 ee A Benchmark Model for Large-Scale Reaction Networks
Computational Run Times and SDE Calls of the Benchmark Models of Large-Scale l1 51 ee The effect of Ø and w on the probabilistic steady state approximation’s speed up when simulating the illustrative example reaction network Parameter 4 is constant at 10 Accuracy and speed up of the probabilistic steady state approximation for the second example reaction network Parameter À is constant at30000
The reactions, kinetic constants, and initial conditions of the protein-protein interaction network example 2 qẶỪẶẼ_k_ Đa The effect of increasing @ on the accuracy and efficiency of the stochastic simulation of the non-linear protein-protein interaction network 0000
A diagnostic reaction network with multiple timescales is shown 0
An overview of the Hy3S numerical methods .004
A description of each command line argument and their defaultvalues
A Non-Linear Cycle Test 2 0 0 Q Q LH nu nà vn g gà và va A comparison of computational times of a large-scale system benchmark The com- putational times of a large-scale system benchmark using the fixed Euler-Maruyama (EM) and Milstein implementations of the HyJCMSS algorithm and the Next Reaction variant of the stochastic simulation algorithm (SSA) ND: Not Determined 1
A bistable biochemical network with multiple timescales and spontaneous escape
A list of consensus sequences for E colio factors 0 020004
vil
Trang 11LIST OF TABLES Vili
A list of the thermodynamic binding free energies between the inducer-bound /ac and
A selected list of ribosome binding sites (RBSs) The DNA sequences starting with AGGA and ending with a start codon are shown The sequences are qualitatively ranked
by their translation efficiency (average proteins per mRNA transcript with all other de- terminants equal) with one being the most efficient The Gibbs free energies of the mRNA folding into a secondary structure (AG foiding) and the rRNA:mRNA hybridiza- tion (AGpyp-ia) are Shown for comparison (Calculating using UNAfold [191]) 156 The enumeration and Gibbs free energies of the regulatory states of an example pro- moter with two operators and a single transcription factor N4: Avagadro’s number V:
A brief list of variant DNA sites and repressor proteins from the /ac and tet operons 200 The primary and secondary pairs of oligonucleotides used in these experiments are shown.212 The rate of cell division [hrs~!] at varying inducer concentrations, measured by OD600 absorbance ND: Not Determined Data courtesy of John Barrett 221
All 55 unique regulatory states of the synthetic promoter are shown with their corre- sponding Gibbs free energies and density of microstates 00 228 (Continued) All 55 unique regulatory states of the synthetic promoter are shown with their corresponding Gibbs free energies and density of microstates 229
Trang 12of chemical and biochemical reactions and simulate their stochastic dynamics using advanced hybrid and multi-scale solvers We also use various design techniques to identify which genetic components must be used to create a synthetic gene network with a desired behavor c c Q Q Q H H Q Q n ng vn kg k k k va
A description of the experiment portion of our combined theory-experiment approach
is shown We construct synthetic gene networks and measure their single-cell gene ex- pression dynamics in Escherichia coli We expand our toolbox of well-characterized genetic components by comparing the experimental data to the model results and cal- culating missing kinetic or thermodynamic data We then repeat this process with addi- tional synthetic gene networks to verify the data 2 2 .0.04
The probability distribution function, P(X), and cumulative distribution function, F(X),
of a uniform random variable, X ~ URN(a.b) 0 ee The probability distribution function, P(X), and cumulative distribution function, F(X),
Of an exponentially đistributed random variable,X ~Exp(À) The probability distribution function, P(n), and cumulative distribution funetion, F(n),
of a Poisson distributed random variable, ø ~ Poisson(Àf) The probability distribution function, P(X), and cumulative distribution function, F(X),
of a Gamma distributed random variable, X ~Gamma(N,A) The probability distribution function, P(X), and cumulative distribution function, F(X),
of a Gaussian distributed random variable, X ~N(u.07) 2 ee
Three trajectories of the random walk called Gambler’s Ruin, starting at X, = 100, are shown The red gambler is notdoingsowell 2 20.00.00 0.00000 (Left) A single trajectory of the displacement and velocity of a Brownian patticle is shown with a diffusion coefficient, D = 1, on the interval T = [0, 1] (Right) The same
velocity trajectory is shown on T = (0.4, 0.6] and T = [0.45, 0.55], showing the fractal
iX
Trang 13LIST OF FIGURES
2.8 <A Lévy process is the sum of a deterministic, Poisson, and Wiener process (Left) Single trajectories of the deterministic, Poisson, and Wiener processes are shown with
œ = —l and G= À = I on the interval 7 = (0, 10] (Right) The resulting Lévy process
is shown on the interval 7 = [0, 10] and 7 = |0,100] Ặ
2.9 Using the Euler-Maruyama scheme, the numerical solution of the simple linear stochas- tic differential equation in Eq (2.119) is calculated using a time step of (red) Ar = 277
or (yellow) At = 2~-> and compared to the (blue) exact solution 2.10 The strong order of accuracy of the Euler-Maruyama scheme is determined by calculat- ing the slope of a linear fit of a log-log graph of the average absolute error between the numerical and exact solutions, < €st;ong >, versus the time step of the Euler-Maruyama scheme, Ar The strong order of accuracy y = 0.5 is verified The average is taken over
2.11 Using the Milstein scheme, the numerical solution of the simple linear stochastic dif- ferential equation in Eq (2.119) is calculated using a time step of (red) At = 2-7 or (yellow) At = 2>> and compared to the (blue) exact solution 2 20.0.0 2.12 The strong order of accuracy of the Milstein scheme is determined using the same method as in Figure 2.10 The strong order of accuracy y= 1 is verified 2.13 Comparison of the (A) mean and (B) variance of the Cycle Test with a system size of
100, using the (lines) stochastic simulation algorithm and the (dots) HyJCMSS method without the Multiple Slow Reaction approximation .00004 2.14 The weak (A, C) mean and (B, D) variance errors of species A and D of the Cycle Test
at system sizes (light/dotted) 100, (dark/dashed) 1000, and (dark/solid) 10 000, using the (A, B) HyJCMSS method and (C, D) HyJCMSS method with the Multiple Slow
2.15 Probability distributions of (left) species A, B, C and (right) D, E of the Cycle Test with
a system size of 100 and a time of 5 seconds, comparing the (lines) SSA and (dots) HyJCMSS method without the MSR approximation, .4.4 2.16 Probability distributions of species (left) A, B, C and (right) D, E of the Cycle Test with
a system size of 100 and a time of 40 seconds, comparing the (lines) SSA and (dots) HyJCMSS method without the MSR approximation, .4 2.17 The (solid lines) oscillatory dynamics of monomers P1 and P2 of the Pulse Genera- tor reaction system are compared alongside their (dashed lines) computational running times, using the (A) HyJCMSS method and (B) SSA Note that the right Y-axis is the number of seconds the processor requires to simulate the system up to that point
2.18 The weak (A,B) mean and (C,D) variance errors of species (A,C) A and (B,D) C of the crystallization example, using the Direct Hybrid method, the HyJCMSS method, and the HyJCMSS method with the Multiple Slow Reaction (MSR) approximation with time steps, Atspr, of (triangled pointed up) 0.01, (squares) 0.05, (diamonds) 0.1, (trian-
2.19 A stochastic simulation trajectory of the dynamics of reactions R1 through R3 for 1000 seconds The light lines are the species labeled A, B, and C while the dark lines are the species labeled E and D The vertical dotted lines define the times f; totgz
38
59
80
Trang 14LIST OF FIGURES
2.20 The time evolution of the (top) mean and (bottom) variance of species A, E, and C
of the illustrative example reaction network, using either the (solid / blue) stochastic simulation algorithm or the (circles / red) probabilistic steady state approximation with
2.21 The probability distributions of species A and E of the illustrative example reaction network at multiple time points, using either the (solid / blue) stochastic simulation algorithm or the (dashed / red) probabilistic steady state approximation with parameter set (À„ Ø, @) = (10, 10, 10) (Inset) The time evolution of the L» distance between the
2.22 The time evolution of the (top) maximum weak mean error and the (bottom) maximum weak variance error of the illustrative example reaction network, using the probabilistic steady state approximation with A = 10 and varying Ø and@ 2.23 The average L2 distance between exact and PSSA-enabled probability distributions of the illustrative example reaction network as a function of the parameters @ and @ with KEL Cee 2.24 The probability distribution of species S1 in the second example at t = 0.2 seconds, comparing the usage of either (solid / blue) the stochastic simulation algorithm, (circles / red) the probabilistic steady state approximation with (®, @) = (10, 10), or (triangles /
2.25 The probability distribution of species S2 in the second example at t = 0.2 seconds, comparing the usage of the stochastic simulation algorithm and the probabilistic steady state approximation The lines and markers are the same as in Figure 2.24 2.26 The probability distribution of species S3 in the second example at t = 0.2 seconds comparing the usage of the stochastic simulation algorithm and the probabilistic steady state approximation The lines and markers are the same as in Figure 2.24 2.27 The maximum weak mean and variance errors for the second example, comparing the probabilistic steady state approximation with different parameter sets 0 2.28 The probability distributions of species S1:P, S2:P, S3:P, S4:P, and $1:S2:P from the non-linear protein-protein interaction reaction network at a time of 100 seconds, ob- tained by using either the (solid / blue) original stochastic simulation algorithm, the (circles / red) probabilistic steady state approximation with (Ø, @) = (10, 10), and the
2.29 The probability distributions of species P, A, and $1:S4:P from the non-linear protein- protein interaction reaction network at a time of 100 seconds, using the same lines and
2.30 The effect of the average size of the gap in time scales, < G(t) >, in a diagnostic reac- tion network on the speed up of the probabilistic steady state approximation with (0,
@) = (25, 1000) (Dashed) A straight line fit to the log-log transformation of the data
with the algorithm performance scaling proportionally to < G(r) >°!5, (Dotted Ver-
tical) The break even point in terms of speed up for using the probabilistic steady state approximation, including the computational overhead 00.0
Xi
Trang 15LIST OF FIGURES xii
to the raw data (C) A 20 point forward/backward zero phase distortion filter has been applied to the raw data The black arrows show the times at which the system is assumed
to have relaxed to a quasi-steady state distribution, 2 0.000004 115 The main window of the graphical user interface 2 0.0.20 2.2.-00 004 125 Two auxiliary windows showing the interfaces for adding reactions and setting initial
An auxiliary window showing the interface for adding systematic variations of kinetic
Probability Distributions of the Non-Linear Cycle Test Probability distributions of species (Top) E and (Bottom) H of the Non-Linear Cycle Test at a time of 10 seconds for increasing system sizes, ranging from 100 to 10 000 The chemical Langevin and differential Jump equations are integrated using a fixed Euler-Maruyama scheme with atime step of 0.01 seconds All other parameters are set to default values 132 Weak mean and variance errors of the Non-Linear Cycle Test The average normalized weak (Top) mean and (Bottom) variance errors of the Non-Linear Cycle Test using the Euler-Maruyama scheme with a fixed time step of 10-7 seconds and system sizes of (red) 100, (green) 200, (blue) 316, (magenta) 1000, (cyan) 3160, and (black) 10 000
Effect of integrator time step, Afspg, on weak mean and variance errors The average normalized weak (blue) mean and (red) variance errors of the Non-Linear Cycle Test with a system size of 10 000 using the fixed time step Euler-Maruyama scheme with Atspg ranging from 107° to 107? seconds All other parameters are set to default values 133 Effect of adaptive scheme’s user-defined tolerance, SDE7,;, on weak mean and variance errors The average normalized weak (blue) mean and (red) variance errors of the Non- Linear Cycle Test with a system size of 10 000 using the adaptive time step Milstein scheme with the user-defined SDE7,) ranging from 10-7 to 10~° All other parameters are set to default values 2 c Q Q Q HQ gu ng ng v kg kg kg v kia 134 Distribution and trajectories of the Schlogl reaction module (Top) The relative prob- ability distribution of the number of E’ molecules at 50 seconds (Bottom) Out of 10
000 independent trajectories, the 225 shown here exhibit spontaneous transitions from low to high or high to low numbers of EÍ molecules 138 Branching of solution affects bound scaffold complexes An ensemble of 10 000 tra- jectories of the $1:P2:S3 scaffold complex, where trajectories are colored according to the branch of the solution The number of E’ molecules resides in either the (red) low
or (blue) high stable state Both the (black solid lines) mean and (black dashed lines) mean + standard deviation are shown for both branche 139 Multimodal distributions of mRNA and scaffold molecules over time The relative probability distribution of the numbers of (Left) mRNA molecules and (Right) free scaffold molecules at (blue) 500 seconds, (red) 1000 seconds, and (green) 2000 seconds 139
Trang 16The linear alignment of the RNA polymerase subunits, the conserved protein domains
of sigma factors, based on o’°, and the different regions of promoter DNA are shown, The promoter DNA is subdivided into multiple genetic elements: the upstream element (UPS), the -35 and -10 hexamers, the extended -10 region, the spacer, the discriminator (DIS), and the initial transcribed region UTR) The core RNA polymerase, composed of the HaMBP’ subunits, contacts the promoter at two regions: the two @ subunits can form attractive interactions with an AT rich UPS region and the BB’ crab claw wraps around the DIS and ITR regions The o”° family of sigma factors have four conserved domains; the three that are shown have been crystallized while Op, is disordered in solution The Op2, Op3, and Op4 domains respectively form contacts with the -10 hexamer, extended -10, and -35 hexamer regions The o”° housekeeping sigma factor contains all four domains while the o* general strong response factor contains only the op7 and op3 domains, thus explaining why it only binds to the -10 and extended -10 regions
A plaster atomic model of the RNA polymerase-o factor (Holoenzyme) complex bound
to promoter DNA is shown The model shows how the (light green) -35 and -10 hexam- ers in the (red) promoter DNA contact the (orange) Op2, (blue) Op3, and (dark green) Opa sigma factor protein domains The (yellow) linker between the Op3 and Op, do- mains and the (grey) core RNA polymerase are also shown (Image courtesy of the Pingry school in Martinsville, NJ in coordination with Dr Tim Herman of the Milwau-
A mathematical model of an AND protein device (a) From a library of components, protein domains and peptide ligands are fused together to create a system of interacting fusion proteins (b) The binding and unbinding of the ligands and their correspond- ing protein-protein interaction (PPI) domains generates eight different bound scaffold complexes, numbered C1 through C8, and the free scaffold activator, SA The dynam- ics of these non-linear interactions are described using either deterministic ordinary differential equations or as a stochastic process The rate of transcriptional initiation
is computed using an equilibrium partition function with nine different promoter con- figurations (c) Four configurations contribute to transcriptional initiation, yielding the probability of finding the promoter region in a transcriptionally ready state The rate
of transcriptional initiation is calculated using this probability and the kinetic rate of
xiii
150
Trang 17LIST OF FIGURES XIV
is the non-shaded region and denotes the allowable rpgp; and rpgp2 that will not result
in excessive false positive activation should the production of one DNA-binding protein
(a) The rate of false positive transcriptional initiation, rr;, is measured with respect
to increasing affinity (K,) of the protein-protein interaction domains with their peptide ligand, while setting the production rate of DBP), rpgp1, to (blue) 0.1, (green) 0.5, (red) 1.0, (cyan) 2.0, (magenta) 3.0, or (yellow) 4.0 proteins/second The false positive rate
of transcriptional initiation is biphasic with respect to the Ky, quickly increasing from basal expression to a maximum at Ky ~ 1 - 3.5 uM and then slowly decreasing back to basal levels (b) The rz; is measured over increasing rpgp; while varying AGp Small decreases in AGp lead to large increases in the false positive activation rate 2 181 (a) The rate of false positive transcriptional initiation, rry, is measured with increas- ing production rates of a DNA-binding protein 1, rpgp;, while varying the constitutive production rate of both competitively inhibiting proteins, rczp A large rcrp is required
to sufficiently decrease the false positive activation rate (b) The ry; is measured with increasing rpgp; while varying the constitutive production rate of the scaffold activator, rsa Small increases in rs4 lead to large increases in the false positive activation rate
The operating range for the production rates of the DNA-binding proteins, rpgp; and rppp2, and the corresponding percentage of maximum transcriptional initiation are shown for four different sets of molecular components and production rates The contour lines and the color scales on the axes are described in Figure 3.4 Deviations from the base- line parameters are as follows: (a) (sa, rcrp) = (0.01925 p/s, 1.925 p/s) (b) ('s4, Ku) = (0.01925 p/s, 50 uM) (c) (trp, Ky) = (1.925 p/s, 50 uM) (d) (rsa, rere Ky) = (0.01925
The operating range for the production rates of the DNA-binding proteins, rppgp; and rppp2, and the corresponding percentage of maximum transcriptional initiation are shown using different sets of molecular components The PPI domain-ligand Ky is 0.01 uM and the DNA-binding proteins’ AGp is either (a) -6.5, (b) -7.0, (c) -7.5, or (d) -8.0 kcal/mol All other parameters are at their baseline values The operating ranges are (a)
>3.85 proteins/sec (p/s), (b) 0.39 p/s, (c) 0.075 p⁄s, and (d) 0.027p/S 184
Trang 18The stochastic dynamics of (a, c, e) the rate of transcriptional initiation, r7;, and (b, d, f) the number of molecules of complexes (blue) C5, (green) C7, and (red) C8 in response
to step changes in rpgp; and rpgp2 using different molecular components At time 0, both rpgp; and rpgp2 are zero proteins/sec (p/s) (plus signs) At one and two hours, respectively, rpgp1 and rpgp2 are increased to 0.5 p/s (crosses) At four and six hours, respectively, rpgp; and rpgp2 are decreased to zero p/s The Ky and AGp are 0.01 uM and -6.5 kcal/mol, respectively The rc7p and rs, are respectively (a, b) 0 p/s and 0.0385 p/s, (c, d) 0.1925 p/s and 0.0385 p/s, and (e, f) 1.925 p/s and 0.0193 p/s 2 Two alternative gene networks that exhibit AND-like Boolean behavior (a) Two DNA- binding proteins, DBP; and DBP2, bind weakly to their respective operators, but pos- sess (red dashed lines) strong surface interactions with each other and RNA poly- merase Only when both DBP; and DBP? are present in sufficient concentration will they strongly bind to their operators, recruit RNA polymerase, and transactivate tran- scriptional initiation (b) A three gene network uses three sterically repressing DNA- binding proteins, DBP;, DBP2, and DBP%3, to initiate transcription with AND-like be- havior Only when both DBP; and DBP? are present in sufficient concentration will they significantly repress production of DBP3 After remaining molecules of DBP3 have sufficiently degraded, RNA polymerase may bind to the output’s promoter and begin transcriptional initiation 6 c c Q Q Q vu ng gà và va Examples of protein devices that activate or repress transcriptional initiation accord- ing to compound Boolean behaviors By fusing additional DNA-binding domains to already utilized peptide ligands, the protein device activates according to compound AND-OR Boolean behavior with either (a) 3 or (b) 4 regulatory inputs (c) By adding an additional PPI domain and peptide ligand, the protein device activates with AND-AND behavior with three regulatory inputs (d) By removing the transactivation domain and surrounding the promoter with operators, the protein device represses transcriptional initiation with AND behavior, which is equivalent to NOT AND (e) Using appropri- ately placed operators and DNA-looping, the protein device uses long range interactions
to activate or repress with AND Boolean behavior Each protein device (a-e) has one competitively inhibiting protein for each PPI domain (not shown) The network connectivity for a lac-tet-ara oscillating gene network Below, the se-
XV
187
quences of the promoter regions, using a single, promoter-overlapping operator per gene 198 (Top) The dynamic behavior of the Dlac, Dtet, and Dara proteins in the simple example over 27.7 hours (Bottom) The cyclic covariance function of the same system The dynamical behavior of the (red) Dlac, (blue) Dtet, and (green) Dara proteins using the 3-2-1 operator configuration with mutant tet and lac operators and a TetR protein
201
Trang 19LIST OF FIGURES XVi
(A) The normalized average cyclic correlation functions of the (red) Dlac, (blue) Dtet, and (green) Dara proteins using the 3-2-1 operator configuration with mutant tet, lac, and ara operators and a TetR protein half-life of 30 minutes The period of oscillation
is 16.2 + 4.1 hours (B) The same system as in (A), but with a TetR protein half-life of
10 minutes The period of oscillation is now 15.3 + 2.7 hours The vertical gray lines represent the standard 68% confidence interval, 2 0.0.0.2 -.0004 203
A plot of period of oscillation versus repressor-operator affinity for models with (A, B)
2 operators per genes and (C, D) 3 operators per genes (A, C) The affinities of only one set of operators are modified while the other two sets of operators are held at an affinity
of 10!° M~! (B, D) The affinities of all operator sites in all genes are symmetrically
altered The forward kinetic constant is always W8& (Msp we 205 Repressor-operator affinity asymmetry causes asymmetric pulse widths (Left) The dynamics of the Dlac protein from a model with 2 operators per gene, with one operator having a repressor-operator affinity of 10! M~! and the others with repressor-operator affinities of 10!° M~! (Right) The dynamics of the Dlac protein from a model with 2
operators per gene, each having repressor-operator affinities of 10!90M—!, 206
The effect of initial ribosome and RNA polymerase numbers on the period of oscilla- tion The number of initial RNA polymerases and ribosomes is, respectively, varied
The effects of protein and mRNA half-lives on the period of oscillation (A) The half- lives of all mRNA species are symmetrically varied from 5 to 15 minutes (B) The half-lives of 2 mRNA species are fixed at 5 minutes while the half-life of the third species is varied from 5 to 15 minutes (C) The half-lives of all protein species are symmetrically varied from 10 to 60 minutes (D) The half-lives of all protein products
of 2 genes are kept constant at 20 minutes while the half-lives of all protein products of
The annotated sequence of the constructed synthetic promoter is shown It contains two overlapping tetO2 operators from the tet operon and one overlapping lacO1 operator from the /ac operon The ribosome binding site (RBS) and the start codon of the cycle3
Agar plates streaked with DH5aPro E coli cells containing the LT1 pGLOW plasmid, incubated at 37°C for 18 hours, are shown (A) without IPTG or aTC (B) with 2mM IPTG, (C) with 200 ng/mL aTC, and (D) with 2mM IPTG and 200 ng/mL aTC Writing
Agar plates streaked with DH5aPro E coli cells containing the LT1 pGLOW plasmid, incubated at 37°C for 36 hours, are shown (A) without IPTG or aTC (B) with 2mM IPTG, (C) with 200 ng/mL aTC, and (D) with 2mM IPTG and 200 ng/mL aTC 215
Trang 20The effect of increasing the concentrations of both aTC and IPTG on the probability distribution of GFP Fluorescence is shown at (A) 3 hours, (B) 6.5 hours, and (C) 9 hours after initial inoculation The concentrations of the inducers are (green) 1 ng/mL aTC and 0.01 mM IPTG, (blue) 10 ng/mL aTC and 0.1 mM IPTG, (red) 100 ng/mL aTC and 1 mM IPTG, and (black) 200 ng/mL aTC and2mMIPTG The average GFP Fluorescence over each concentration of aTC and IPTG is shown at (A) 3 hours, (B) 6.5 hours, and (C) 9 hours after inoculation The steady-state distributions of GFP fluorescence (at 9 hours) over varying concentra- tions of aTC and IPTG are shown The means of these distributions are denoted by red
(A) The annotated sequence of the synthetic promoter is shown, including the lacO1 and tetO2 DNA operators, the -35 and -10 hexamer sequences, and the ribosome bind- ing site (RBS) The Gibbs free energies between each repressor and its operator, which are AGige—pna and A Grer— pa, as Well as the steric interactions between each repressor
at the Holoenzyme are also shown, which are AGo|—~rnap, AGo2~pna, and AGg3_pya (B) The Lac repressor tetramer is bound by up to 4 molecules of IPTG with an equilib- rium association constant, K/“°, resulting in a more positive Gibbs free energy between
it and its operator, A Gije.7pTG—pna- (C) The Tet repressor dimer is bound by up to 2 molecules of aTC with an equilibrium association constant, K/*, resulting in a more positive Gibbs free energy between it and its operator, A Gier:arc_pna- (D) The Holoen- zyme complex has binding affinity with the -35 and -10 hexamer sequences of the pro- moter with a Gibbs free energy of AGryap—pna After it has successfully assembled
on the promoter, the Holoenzyme initiates transcription in a first order reaction with a
XVil
217
217
218
Trang 21in the unfolded state and can also be quantified by its Gibbs free energy of binding,
AG folding Both of these Gibbs free energies may be calculated using a variety of pro- grams, such as UNAFold [191] After the 30S ribosomal subunit binds, the remainder
of the ribosome assembles and initiates translation, followed by folding and matura- tion of the reporter GFP protein (B) We show the mRNA secondary structure that sequesters the ribosome binding site at the 5’ end (C) We show the RNA:RNA duplex that forms when the 5’ end of the mRNA, containing the RBS, binds to the 3’ end of
(A) The steady-state average GFP fluorescence over sixteen different concentrations
of aTC and IPTG with the background autofluorescence subtracted is compared to the results of the steady-state mathematical model with (B) AGo,-rvap = +25 kcal/mol and (C) AGo,-rwap = +0.5210 kcal/mol All other kinetic and thermodynamic parameters
in the upstream position while two lacO1 operators are located in the middle and down-
The predicted steady-state average GFP fluorescence of the suggested synthetic pro- moters, shown in Figure 4.13, while assuming that AGg,_py4 = +0.521 kcal/mol,
AGo2_pna = +1.5 kcal/mol, and AGo3_pn4 = +1.0 kcal/mol (A) The first synthetic
promoter in Figure 4.13 has the most AND-like logical response, but still contains no- ticeable plateaus of gene expression in the absence of either aTC or IPTG This place- ment of tetO2 and lacO1 operators maximizes the efficiency of repression in the absence
of either inducer (B) Placing the lacO1 operator in the middle position and leaving the tetO2 operators in the less efficient positions results in an increase in the plateau of gene expression in the absence of aTC (C) A single tetO2 operator, located in the second most efficient position, (D) Finally, placing only a single tetO2 operator in the least efficient position dramatically increases the low plateau of gene expression in the
XVII
Trang 22The stochastic bifurcation diagram of the bistable Schlégl model, showing the station- ary and non-stationary steady-state distributions to the (a) forward and (b) reverse ki- netic Master equations with bifurcation parameter B (c) The corresponding determinis- tic bifurcation diagram with (solid) stable and (dashed) unstable steady-state solutions (d-h) Slices of the stochastic bifurcation diagram depicting the (black/solid) stable and (red/dashed) unstable steady-state solutions The unstable solutions in (d, h) are non-
XIX
Trang 23Chapter 1
Introduction
1.1 An Overview of our Methodology
Our main goal is to use advanced numerical methods to rationally design and construct synthetic biological organisms for improved biotechnological, medical, and industrial uses To do this, we employ a mixture of theoretical, computational, and experimental techniques We use a toolbox
of well-characterized genetic components described by kinetic and thermodynamic data, detailed physical and chemical models of regulated gene expression, advanced stochastic numerical meth- ods, and sensitivity, stability, and bifurcation analysis of stochastic systems to predict the DNA sequence of a synthetic gene network that possesses a desired dynamical behavior and performs
a useful function We then construct the hypothesized gene network in Escherichia coli and test its behavior under varying environmental conditions By comparing our predictions to the experi- mental data, we determine the accuracy of the numerical methods, the kinetic and thermodynamic data, and the chemical models Further, by constructing and testing a synthetic gene network whose behavior sensitively depends on a single unknown kinetic or thermodynamic constant, we may indi- rectly measure the missing data and expand the well-characterized toolbox of genetic components This combined theory-experiment methodology provides a promising bottom-up route to the pre- dictive rational design of synthetic biological organisms and the conversion of molecular biology into an engineering science
1.2 Key Results Inside This Dissertation
As of January 2007, the following doctoral research has yielded six peer-reviewed publications [1, 2, 3, 4, 5, 6] with two additional ones in preparation [7, 8] As a service to the reader, the key results, developments, and conclusions within this dissertation are succinctly listed below
Related to the Mathematics
e (In Chapter 2) The hybrid jump/continuous Markov process stochastic simulation (HyJCMSS) algorithm dramatically speeds up the stochastic simulation of a system of coupled chemical and biochemical reactions if they contain at least one frequently occurring reaction with suf-
Trang 24CHAPTER 1 INTRODUCTION 2
Experiments
data on individual a Complex Gene & Protein
Dynamical Behavior
Molecular Toolbox Design Techniques
DNA Binding Protein Domains
(A) | Protein-Protein Interaction Domains {— Sensitivity Analysis (E)
Kinase / Phosphatase Domains of Stochastic Systems
Synthetic Gene and Protein Networks Fast Stochastic Numerical Solvers
synthetic Promoters and 5’UTR mRNAs Hybrid / Multiscale Methods
with Designed Sequences
(B) Multiple Inter-regulated Genes Driving Probabilistic Quasi-Steady State
Reverse Stochastic Simulation Regulated Gene Expression Coupled to
Metabolism and Signal Transduction MPI Parallelized for Supercomputing
\ System of Bio/Chemical Reactions r
Many reactions and species (thousands!)
Protein-Protein, Protein-DNA,
Transcriptional initiation & elongation Translational initiation & elongation Degradation & Cell Division
Figure 1.1: A description of the theory/computational portion of our combined theory-experiment approach is shown We combine genetic components from a well-characterized toolbox into syn- thetic gene and protein networks We then model these networks as a system of chemical and bio- chemical reactions and simulate their stochastic dynamics using advanced hybrid and multi-scale solvers We also use various design techniques to identify which genetic components must be used
to create a synthetic gene network with a desired behavor
Trang 25Empirically Measured Data Computer-Aided Design Results
Kinetic & Thermodynamic Testable DNA Sequence of
Data on Individual a Complex Gene & Protein (E) Molecular and Genetic Network With a Desired
Components Dynamical Behavior
Parameter Estimation Genetic Engineering
Create Stochastic Model Chemical Synthesis of DNA
and Identify Free Parameters
Selection of Bacteria
4
Culture under Varying Conditions
Calculate Missing Data
Gene Expression Data
Maintain in Exponential Growth Phase (C) Measure Single-Cell Dynamics
of Reporter Gene Expression
Figure 1.2: A description of the experiment portion of our combined theory-experiment approach
is shown We construct synthetic gene networks and measure their single-cell gene expression dy- namics in Escherichia coli We expand our toolbox of well-characterized genetic components by comparing the experimental data to the model results and calculating missing kinetic or thermo- dynamic data We then repeat this process with additional synthetic gene networks to verify the data
Trang 26CHAPTER 1 INTRODUCTION 4
ficiently large numbers of molecules (30 - 100) of the reactant and product species (called
“fast/continuous” reactions) It approximates the occurrences of these fast/continuous reac- tions as a continuous Markov process, but retains the jump Markov process representation of the slow/discrete reactions The resulting system of stochastic differential equations, com- posed of the chemical Langevin equation and the newly derived differential jump equations, are solved using well-characterized stochastic numerical integrators Consequently, the accu- racy of the method is rigorously defined by the characteristics of the SDE numerical integra- tor
e (In Chapter 2) The equation-free probabilistic steady state approximation (EF-PSSA) dramat- ically speeds up the simulation of a system of coupled chemical and biochemical reactions
if it contains at least two frequently occurring reactions with small numbers of molecules of reactant or product species (called “fast/discrete” reactions) It dynamically determines when the system has converged to a probabilistic quasi steady-state, uses an equation-free sampling technique to calculate the marginal distribution of the fast dynamics, and substitutes those samples into the slow dynamics It speeds up the simulation of a system with a separation of time-scales while accurately capturing the distributions of both the fast and slow dynamics
e (In Chapter 2) The Hy3S software package (Fortran95 / MPI Parallelized), which is short for Hybrid Stochastic Simulation for Supercomputers, facilitates the dissemination of the devel- oped stochastic numerical methods while making them more user-friendly An accompanying graphical user interface (Matlab required) helps users to create the needed input files The software package has numerous features that improve research productivity
e (In Chapter 5) The reverse stochastic simulation algorithm simulates a trajectory of a jump Markov process with an inverted random vector field We use it to approximate the actions of the reverse-time cocycle on a specified domain
e (In Chapter 5) The iterative forward-reverse sampling (IFRS) procedure uses both forward and reverse time stochastic simulation to compute the bifurcation diagram of a system as a jump Markov process, including both its stable and unstable random attractors Importantly,
if the system is multimodal with long escape times between stable mesoscopic states, the forward-reverse iterative action enables it to avoid long simulation times while sampling all
of the probable phase space The method can also compute the non-stationary distributions of random Milnor attractors, such as those surrounding saddles We use these newly developed stochastic numerical methods to calculate the stochastic bifurcation diagram of the bistable Schlégl chemical model and the Lyapunov exponents of its stable and unstable random at- tractors
Related to Synthetic Gene Networks
e (In Chapter 3) We propose a new type of synthetic gene and protein network, called a pro- tein device, that activates gene expression if and only if two different transcription factors are both present, thus mimicking the response of Boolean AND logic gate The protein device
is a system of synthetic scaffold and scaffold-binding proteins composed of modular transac- tivating, protein-protein interaction, DNA-binding, and non DNA-binding protein domains
Trang 27CHAPTER 1 INTRODUCTION 5
We perform both a deterministic and stochastic sensitivity analysis of the protein device to determine the characteristics of the protein domains that produce the most accurate Boolean response We also suggest additional protein devices that mimic other Boolean logic gates, including ones with three or four inputs One of the key advantages of these protein devices is that they are highly scalable; multiple independently acting AND gates can reuse the scaffold protein, reducing the total number of required engineered proteins
e (In Chapter 3) We use our hybrid stochastic numerical methods and the cyclic covariance function to perform a systematic stochastic sensitivity analysis of an oscillating gene net- work The gene network is composed of three genes whose regulatory and coding sequences are extracted from the /ac, tet, and ara operons We alter multiple experimentally mutable characteristics of the system, including the number and affinity of the DNA operators, the repressor protein and mRNA half-lives, and the concentrations of RNA polymerase and ribo- some, to determine their effect on the period, amplitude, and robustness of oscillations The final conclusions yield the set of genetic components that, when combined, will produce an oscillating gene network with a desired period
e (In Chapter 4) We experimentally constructed a synthetic promoter in Escherichia coli (DH5a Pro) that maximally expresses a reporter fluorescent protein, gfp, only when two chemical inducers, aTC and IPTG, are both added to the system in sufficient concentration The se- quence of the synthetic promoter was designed from scratch by using pairs of synthesized oligonucleotides and contains two overlapping tetO2 operators and one overlapping LacO1 operator Using flow assisted cell sorter (FACS), we measure the steady-state expression of the synthetic promoter under sixteen different concentrations of aTC and IPTG The activity
of the synthetic promoter behaves like a fuzzy AND gate, with two plateaus of gene expres- sion Using a steady-state mathematical model of the system, we determine that the fuzzy Boolean response is caused by the Lac repressor’s insufficient ability to prevent the RNA polymerase from binding to the promoter Using the experimental data, we calculate that the steric interaction between the Lac repressor and RNA polymerase is only AG ¥ 0.3 kcal/mol Our results are a prototypical demonstration that one can design and debug a synthetic gene network using advanced numerical methods
1.3 A Brief Introduction to the Mathematical Results
1.3.1 The Initial Motivation
Traditionally, the dynamics of chemical species participating in reactions have been described as a deterministic process, using ordinary or partial differential equations to describe their time evolu- tion However, through both theory [9, 10] and observation [11], it is now generally accepted that the chemical kinetics of reactions occurring in tiny volumes and with dilute reactant concentrations invalidate the continuity and differentiability assumptions that underlie a deterministic process In- stead, the system must be described as a type of stochastic process Stochastic chemical kinetics describes the dynamics of chemical reactions when the system is small and far from the thermody- namic limit While forward-thinking men, including Professors Neal Amundson and Doraiswami
Trang 28CHAPTER I1 INTRODUCTION 6
Ramkrishna [12, 13], saw the need for describing mesoscopic phenomena with stochastic processes, the field did not catch on until it was observed that biological systems are indeed small systems The molecular interactions in a biological system may be deconstructed into a system of chem- ical and biochemical reactions and modeled using ordinary differential equations Inside a single biological cell, however, there are numerous chemical species, such as regulatory proteins, MRNA transcripts, and DNA binding sites, whose number of molecules range from 1 to 100 The cellular processes that involve these chemical species, most notably regulated gene expression, will be the most affected by stochastic fluctuations However, the stochastic fluctuations are only observable at the single-cell level Average measurements taken at the population level do not show these effects Consequently, only with advances in single-cell measurement techniques, such as flow cytometry and optical fluorescence microscopy, was the observational evidence widely available [11]
Now, we begin by describing the system of chemical reactions as a stochastic process But not all stochastic processes are alike Specifically, the stochastic process that exactly describes the chemical kinetics of the system is called a jump Markov process A jump Markov process
is memoryless; the rates of the reactions depend only on the number of reactant molecules at the current time The state of the system is the (non-negative) integer number of molecules of each chemical species The transitions at each state are the possible reactions that may occur When a reaction occurs, the state of the system will transition to an adjacent state at a specific moment in time, called a jump or waiting time The theory behind jump Markov processes and other stochastic processes will be a main topic of chapter 2
In the mid-1970s, two algorithms were developed to simulate the stochastic dynamics of a jump Markov process: Gillespie’s stochastic simulation algorithm (SSA) [10] and the N-Fold method by Bortz, Leibowitz, and Kalos (BKL) [14] Neither of these algorithms were significantly utilized until a landmark paper [15] used the SSA to successfully explain the lytic-lysogenic switch in the lambda phage virus However, both of these algorithms have disadvantages that make them impractical for simulating large, realistic biological systems Their computational run times are proportional to the number of reaction occurrences in the system If the system contains even a single frequently occurring reaction then the computational time of the simulation will skyrocket These deficiencies were personally experienced when generating the results described in my first paper [1], which studied how the regulatory DNA sequences in a bistable genetic circuit affected its stability and switch speed
Improving the computational efficiency of these simulation algorithms was the first motivation
of the doctoral research However, there are two ways of going about this The first way is to attempt
to optimize the computer science steps behind the numerical algorithm This route was indeed successfully followed [16, 17], but did not result in the needed orders-of-magnitude reduction in computational time The second way (which is, perhaps, the more powerful one) is to analyze the underlying mathematics, derive approximations that take advantage of some salient feature of the mathematics, and dynamically apply them only when they valid This is generally the “theme” of the developed stochastic numerical methods
1.3.2 The First Two Stochastic Numerical Methods
The first two developed stochastic numerical methods improve the computational efficiency of sim- ulating biological systems as a Markov process while retaining accuracy However, they each utilize
Trang 29CHAPTER I1 INTRODUCTION 7
very different approximations Further, these stochastic numerical methods are widely available in
a user-friendly software package
HyJCMSS: A Hybrid Jump/Continuous Markov Process Stochastic Simulator
The first developed stochastic numerical describes the system as a hybrid jump and continuous Markov process and uses standard stochastic numerical integrators to solve the resulting stochastic differential equations (SDEs) It dramatically speeds up the stochastic simulation of a system of chemical reactions when the system contains at least one “fast/continuous” reaction A fast/contin- uous reaction is any reaction whose rate is large (> 10 reactions/second) and whose reactant and product species number greater than ~ 50-100 molecules It can be used as a drop-in replacement for the mainstream stochastic simulation algorithm [10]
The key innovations of this stochastic numerical method is that it (a) dynamically partitions the system into “fast/continuous” and “slow/discrete” reaction subsets using a system-independent pair
of parameters; (b) approximates the effects of the “fast/continuous” reaction subset as as continuous Markov process, governed by the chemical Langevin equation, which is a multi-dimensional sys- tem of Ité stochastic differential equations driven by multiple multiplicative Wiener processes; (c) exactly represents the times of the occurrences of the “slow/discrete” reactions as a jump Markov process with time-dependent transition rates, whose jump times are governed by the newly de- rived differential Jump equations, which is also a system of It6 SDEs; (d) uses well-characterized stochastic numerical integrators to solve the resulting coupled system of SDEs
While the parts of the algorithm in (a) and (b) were each tentatively explored in previous stochastic numerical methods, parts (c) and (d) are extremely important and no one had yet put
it all together to form a complete and robust algorithm The two subsets of reactions are, of course, coupled, causing the times of the occurrences of the slow/discrete reactions to depend on the ef- fects of the fast/discrete reactions Previous hybrid methods had completely ignored this fact To account for the state and time-dependence of the transition rates (also called reaction propensities)
of a jump Markov process, we derived a system of differential equations that describe how fictitious quantities, called the reaction residuals, evolve over time
These are very simple equations, written as
dRj|' =ai,dt, R? =log(URN;), j=l Mee" (1.1)
where a; is the transition rate of the j'" slow/discrete reaction, Rj|’ is the reaction residual corre-
sponding to the j”” slow/discrete reaction evaluated at time t, URN; is a uniform random number on (0,1), and where M*‘!°” is the number of slow/discrete reactions in the system The transition rates can depend on the state of the system, time, or other environmental variables, such as temperature,
pH, or pressure The reaction residual is initialized negative and monotically increases with a rate equal to the transition rate of its corresponding slow/discrete reaction Like other residuals, the focus is on what happens when the residual is zero The time at which the reaction residual touches zero corresponds to the “firing” time or jump time of its corresponding slow/discrete reaction Like all jump Markov processes, the jump time is still exponentially distributed The initial condition of the reaction residual is an exponentially distributed random number with a rate 4 = 1 The firing time of the reaction then becomes rescaled according to its transition rate, but because that transition
Trang 30CHAPTER I1 INTRODUCTION 8
rate ultimately depends on time, the rescaling must be described by a differential equation
Differential equations are more powerful than the simple algebraic expressions that had been previously used to describe the jump times of a jump Markov process For one, they can be Taylor expanded For example, the first order Taylor expansion of this equation is simply
Part (d) simply takes the very next step The chemical Langevin equation is an [té stochastic differential equation (in fact, it is the most difficult type of Markovian SDE to solve) The differen- tial jump equations are also It6 SDEs because the transition rates depend on the state of the system, which is a Markov process These two equations are, consequently, coupled and must be simultane- ously solved using a stochastic numerical integrator for SDEs The numerical integrators for SDEs are very different from their deterministic counterparts, but they are well-characterized and numer- ous good schemes have been created [18] The presence of “noise” notwithstanding, there is a both
a rigorous and exact method of calculating the difference between the numerical and exact solutions
of a stochastic differential equation This numerical error of these numerical integrators have been carefully and rigorously studied We can thus take advantage of good mathematical theory to obtain
a rigorous answer to a difficult problem Thus part (d) increases the rigor of developing hybrid stochastic numerical methods by converting the problem to a very standard mathematical one with
a proper foundation It also provides new avenues of improving the efficiency of the numerical algorithm by taking advantage of new stochastic numerical integrators, such as the balanced im- plicit method [19] or ones using adaptive time stepping [20] We have already used adaptive time stepping schemes to improve the efficiency of this stochastic numerical method
In the original paper, the numerical method was called the “Next Reaction Hybrid” method, but that was frequently confused with Gibson and Bruck’s Next Reaction variant of the stochastic simulation algorithm [17] Subsequently, in the article on the software simulation package [5], it
is called the Hybrid Jump/Continuous Markov Process Stochastic Simulator (HyJCMSS) method
We use that name throughout this dissertation
The Equation-Free Probabilistic Steady-State Approximation
The second developed stochastic numerical method uses ergodic theory of Markov processes to identify and take advantage of any separation in time-scales that exists in the system of chemical reactions It dramatically speeds up the stochastic simulation of an arbitrary system of chemi- cal reactions when it contains at least a pair of fast/discrete reactions It is complementary to the HyJCMSS method because the dynamics of a fast/discrete reaction may not be validly approxi- mated as a continuous Markov process For reasons stated below, we named it the equation-free probabilistic steady-state approximation (EF-PSSA)
Trang 31CHAPTER I1 INTRODUCTION 9
The EF-PSSA stochastic numerical method (a) dynamically partitions the system into “slow/dis- crete” and “fast/discrete” reaction subsets using two system-dependent parameters; (b) detects when the effects of the fast/discrete reactions have caused the system to reach a probabilistic quasi-steady state; (c) extracts samples from the state of the system to calculate the marginal distribution of the
“fast” species; (d) subsequently turns off the occurrences of the fast/discrete reactions; (d) and fi- nally uses the extracted samples to determine the time of the next slow/discrete reaction occurrence and the state of the system from it occurred
In parts (a) and (b), we identify when the dynamics of the full multi-dimensional system have converged to a lower dimensional random manifold On the random manifold, the fast dynamics have relaxed to a probabilistic quasi-steady state that very slowly evolves over time according to the slow dynamics The distribution of that random manifold may be broken up into the marginal distribution of the fast dynamics and the conditional distribution of the slow dynamics
Previous work has attempted to analytically calculate this marginal distribution, or at least its moments [21, 22], and substitute it into the transition rates of the slow/discrete reactions By doing this substitution, one ignores the fast dynamics and focuses on simulating only the slow dynamics However, for all but the most simple or linear systems (that is, with linear kinetics), this marginal distribution is analytically unsolvable Consequently, for non-linear systems of chemical reactions, attempting to calculate the marginal distribution is just about a dead end
There are three key innovations of our second developed stochastic numerical method First,
it avoids the problem of calculating the marginal distribution of the fast dynamics by using an equation-free sampling technique The stochastic numerical method is not forced to approximate the shape or form of the marginal distribution in any way; it may be non-Gaussian or multi-modal Second, the method accurately simulates the dynamics of both the fast and the slow dynamics Instead of simply calculating the average of the fast dynamics, our sampling technique allows us
to reproduce the full distribution of the fast dynamics Third, by calculating the time of the next slow/discrete reaction and the state from which it occurred, the distribution (and not simply the av- erage) of the slow dynamics are also accurately captured Consequently, we can dramatically speed
up the simulation of the system without sacrificing accuracy in either the fast or slow dynamics The principles behind the sampling technique were partially developed by the Kevrikidis group [23] Thus, we felt it right to give the method the Kevrikidis moniker: equation-free However, unlike the other equation-free methods, our sampling procedure does neither “lifting” from the macroscopic to the mesoscopic system nor “restricting” from the mesoscopic to the macroscopic system The samples are extracted from the mesoscopic system and are directly utilized to predict the future dynamics of the mesoscopic system
Hy3S: Hybrid Stochastic Simulation for Supercomputers
The Hy3S software package is our attempt to disseminate these newly developed stochastic numet- ical methods to a wider audience while making them more user-friendly The Fortran95 software package also comes equipped with a graphical user interface, written for Matlab (Mathworks) Overall, the package contains many useful features not found in other software packages, such as parameter and initial condition scanning, storage of both model and solution data in an optimized binary format, and non-Markovian special events
Trang 32CHAPTER 1 INTRODUCTION 10
The suite of programs is written in MPI parallelized Fortran95 and its source contains 8100 lines of code The most significant parts of the program are contained in about 2500 lines of code The remainder includes data input/output, random number generation, the optimizing data struc- tures, and the rate laws We focused on the supercomputing platform because they are becoming more widely available and their usage significantly increases research productivity, especially when running on multiple processors It is also not terribly difficult for an academic researcher to obtain access to a world-class supercomputing facility, such as the NSF funded TeraGrid
The Hy3S software package contains an optimized implementation of the Next Reaction variant
of the stochastic simulation algorithm [17], four different optimized versions of the HyJCMSS al- gorithm using fixed and adaptive time step schemes of the Euler-Maruyama and Milstein stochastic numerical integrators Each of these numerical methods is available in both serial and MPI modes, creating a total of 10 different programs However, compiler preprocessor directives were used to eliminate code redundancy
The adaptive time step schemes for stochastic differential equations are significantly different from their deterministic counterparts Because the numerical error is itself a random variable, mak- ing accurate decisions based on a a priori measure the numerical error using some expression is very difficult Because the actual error fluctuates, either decreasing or increasing the time step when the estimated error is high or low will inevitably result in a difference between the estimate and the actual error, which may lead to a wrong or highly inaccurate numerical solution Instead, one can measure the a posteriori numerical error of the solution and repeat the calculations using a modi- fied time step However, going back and performing the calculations again with newly generated random numbers will bias the solution To correct this bias, one must generate the random numbers conditioned on what has already been generated The formal description of this procedure is called
a Brownian bridge The procedure consists of evaluating the increments of the Wiener process at successfully smaller intermediate time steps, conditioned on the values of the Wiener process at the beginning and end times We use these Brownian bridges to correctly identify a time step that reduces the generated numerical error to a user-defined tolerance level
1.3.3 Onwards to Random Dynamical Systems
Non-linear dynamics has always been a favorite topic The qualitative behavior of dynamical sys- tems has a simple elegance to it There are only about a half dozen different categories of attractors, such as fixed points, limit cycles, or tori, to which the phase space of a dynamical system, gov- erned by a partial or ordinary differential equation, will converge By understanding the shapes
of these attractors, one can study the long term behavior of the system without touching its actual
“dynamics” One of the most important (and interesting) questions in engineering is what happens
to these attractors when a parameter of the system is altered The system’s long term behavior may dramatically change with only a slight alteration of a single parameter New attractors may appear, others may be destroyed These questions are the domain of bifurcation analysis
The next question in the doctoral research was: Can we apply bifurcation analysis to stochas- tic processes? Specifically, can we compute the stochastic bifurcation diagram of an interesting, non-trivial system of chemical reactions described as a jump Markov process and governed by the chemical Master equation? In addition, we are not simply interested in the moments of the time invariant solution We want to know how the full steady-state joint probability distribution behaves
Trang 33CHAPTER 1 INTRODUCTION 11
along a parameter coordinate The answer (as it turns out) is yes, we can do this However, to find these answers in a rigorous way, we need to use a field of mathematics named random dynamical systems
The field of random dynamical systems is a new one Its foundational book was written by Lud- wig Arnold and published in 1998 [24] * It attempts to merge together two fields of mathematics: stochastic processes and dynamical systems The combined mathematical jargon of both stochas- tic processes and dynamical systems is sure to mess with the minds of future graduate students Consider a cocycle applied onto a measure space whose linearization yields a spectral decomposi- tion of Oseledets random subspaces with associated Lyapunov exponents and rotation numbers To professors, this is otherwise known as job security Notwithstanding the esoteric language utilized, the field of random dynamical systems will become important to engineering mesoscopic systems Many of the tools that allow us to systematically study macroscopic dynamical systems, such as sta- bility and bifurcation analysis, are grounded in dynamical systems theory Consequently, it should not be surprising that, to use these same tools for studying mesoscopic systems, we will require the study of random dynamical systems
1.3.4 Numerical Methods for Stochastic Bifurcation Analysis
Because there is a scarcity of numerical methods for stochastic bifurcation analysis, we began by developing new ones, especially for studying jump Markov processes governed by Master equa- tions The steady-state probability distributions of these Master equations may be converted into a standard Ax = 0 system of linear algebraic equations, however the rank of A is equal to the number
of unique states in the system, which is typically very large and possibly infinite! Consequently, we need to develop kinetic Monte Carlo methods, like our previous algorithms, that sample the stable and unstable random attractors of the system instead of directly calculating their probabilities In forward time, we can use the original stochastic simulation algorithm as well as other approximate
or hybrid stochastic numerical methods, including our developed ones, to simulate the forward time dynamics and apply the forward-time cocycle to the phase space In reverse time, however, there was no such method From inspecting our newly derived differential Jump equations and mentally integrating them backwards in time, we are able to write down a numerical scheme for simulating the dynamics of a jump Markov process in reverse time This Reverse Stochastic Simulation (RSS) method simulates a trajectory of a jump Markov process whose random vector field has become inverted This is not necessarily equivalent to a time reversal of the trajectory’s path, but it does cause the trajectories to become attracted to the unstable portions of the phase space Using the RSS method, we can apply the reverse-time cocycle to the phase space and calculate the unstable random attractor of a jump Markov process
However, there is an additional problem: many interesting systems contain multiple stable mesoscopic solutions whose trajectories only rarely jump from one to the other If we attempt
to calculate the stable random attractor of this system by starting from an arbitrary initial condi- tion and simply simulating trajectories forward in time then the simulation time must be at least as large the frequency of these rare events This is quite impractical Instead, we developed an Iterative Forward-Reverse Sampling (IFRS) procedure that iterates between applications of the forward-time
*The reference is the second edition, which was published in 2002.
Trang 34CHAPTER I1 INTRODUCTION 12
and reverse-time cocycle on the phase space until the phase spaces have converged to their respec- tive stable and unstable random attractors In reverse-time, the trajectories become attracted to the unstable “hump” that prevents trajectories from crossing from one stable solution to another in forward-time This activation barrier of sorts is actually the unstable random attractor of the system and its shape determines the frequency of the rare events
We use the IFRS procedure to calculate the stable and unstable random attractors while varying
a parameter coordinate We also use a form of first order continuation to more efficiently calculate the next set of random attractors along the parameter coordinate After determining the random at- tractors, we may linearize the cocycle around the attractors and calculate their Lyapunov exponents The Lyapunov exponents quantify the stability of the random attractors in a stochastic context Con- sequently, we can construct the full stochastic bifurcation diagram of the system along with their corresponding Lyapunov exponents
For random dynamical systems, there are two different definitions of a stochastic bifurcation The first is a phenomenological or P-bifurcation in which there is qualitative change in the shape
of the probability distribution of a random attractor at a critical parameter value The second is
a dynamical or D-bifurcation in which the Lyapunov exponents of the random attractor perform
a zero crossing at a critical parameter value In chapter 5, we use the RSS and IFRS stochastic numerical methods to generate the full stochastic bifurcation diagram of the well-stirred bistable Schégl chemical model and identify its D-bifurcation points
1.3.5 Conclusions
We have developed two new stochastic numerical methods (HyJCMSS and EF-PSSA) that dramat- ically speed up the simulation of a coupled system of chemical and biochemical reactions when the system contains either one frequently occurring reaction or when the system contains a separation
of time scales These two conditions are different; a system may have one or the other or both Thus, these stochastic numerical methods are complementary and will eventually be combined to form a single stochastic numerical method The software package Hy3S seeks to make these stochastic numerical methods more widely disseminated and user-friendly
We have also developed new stochastic numerical methods that perform stability and bifurca- tion analysis on jump Markov processes These methods include the reverse stochastic simulation algorithm and the iterative forward-reverse stochastic simulation method Stochastic bifurcation analysis is a very new field, especially when applied to jump Markov processes, and these stochas- tic numerical methods will significantly advanced its progress
1.4 A Brief Introduction to the Biological Results
1.4.1 Motivation
The rate of progress in molecular biology is quite staggering Fifty five years ago, Watson and Crick determined the structure of DNA and its implications for genetic inheritance [25] Within a decade, these and other early pioneers discovered that not only does the complementary nucleotide sequence of DNA encode the amino acid sequence of proteins [26], but it also determines the rate
of protein production [27] Thus, DNA not only contains the blueprint for the components of life; it
Trang 35CHAPTER I1 INTRODUCTION 13
also controls the timing and frequency of their production In the next two decades, the mechanisms behind gene expression and its regulation were elucidated More recently, with the sequencing of numerous organisms, including the most commonly used bacterium [28] and our own species [29],
we are beginning to decipher exactly how regulatory DNA sequences modulate the rate of RNA and protein production in response to environmental, metabolic, and regulatory stimuli
We are still a long way from knowing the full blueprint of life, even for simple organisms To demonstrate this, consider this simple question What is the constitutive transcriptional initiation rate of the following promoter from Escherichia coli K-12, cultured in rich media so that it is in the exponential growth phase (in units of mRNA transcripts per second)?
5° - TACACACTTAGTATCCCAGTCGACCCGCTT TTTACA AATATTATGCGGGCCCC GATACT
TTTAGAGCGCAACGACAATG - 3'
The sequences TTTACA and GATACT are the -35 and -10 hexamers that bind to the housekeeping o’° factor of Escherichia coli, however they deviate from the consensus sequence The remaining sequences are randomly generated, but according to the anecdotal promoter “rules” governing the spacing and discriminator regions This promoter is also not regulated by any transcription factors
* Consequently, this should be an easy question The only participating regulatory step is the Holoenzyme formation on the promoter and its successful transcriptional initiation Yet, the ques- tion remains unanswered because we still lack the understanding of how the rates of the individual molecular mechanisms depend on promoter sequence determinants Recent research is studying this topic [30]
Why has progress in molecular biology more or less stumbled on these more quantitative types
of questions? One reason is that, while the individual mechanisms involved in these questions are known, there is less focus on a systematic study on how the rates of these steps depend on sequence determinants and other factors Such a systematic study requires a mathematical description, based
on kinetics and thermodynamics, in order for it to be broadly applicable Consequently, in order for progress in molecular biology to move beyond a qualitative description of molecular mechanisms,
we must convert it to an engineering science that combines the physical sciences, mathematics, and
a systematic study of the causes and effects at the molecular level
While the use of mathematics in biology is a broad topic [31], previous work had often sim- plified the system to reduce the number of equations and gloss over the molecular interactions that individually contribute towards the macroscopic behavior of the system While some understand- ing can be gained from this route, it does not provide a means to the rational design of biological systems Instead, we must derive our mathematical models in terms of the physical and chemical characteristics of each molecular interaction in the system, thus linking a change in a molecular event, quantified by chemical kinetic constants and thermodynamic free energies, to a change in the system’s macroscopic observable behavior
However, the large number of molecular interactions within a biological organism has stymied such a systematic analysis Instead, we must restrict our study to a small subsystem of the biological organism, such as an isolated gene network, where the existence of its molecular interactions, but not necessarily its quantitative characteristics, are already known Consequently, if we can identify all of the interactions between a smaller, isolated portion of the organism and its larger cellular processes, then we may practically tackle the challenge of understanding how the macroscopic
“Unless the random DNA sequence is a partial match for some DNA operator This happens quite often!
Trang 36CHAPTER I1 INTRODUCTION 14
behavior of the isolated subsystem is affected by its molecular events One way of creating a semi- isolated subsystem of the organism is to create a synthetic gene network composed of natural genetic elements with known functions The DNA sequence of this network either directly or indirectly defines the kinetics and thermodynamics of the molecular interactions in the subsystem
Consequently, by making small changes to the DNA sequence, we may quantitatively ana- lyze how small changes in the molecular interactions affect the macroscopic behavior of the gene network By comparing the effects of different sequence determinants with the results of a math- ematical model of the system, based in kinetics and thermodynamics, we can connect the changes
in the DNA sequence to changes in the quantitative characteristics of the participating molecular interactions Using the same type of mathematical model, we may then connect the effects of the participating molecular interactions with the behavior of the system and thus decipher the necessary DNA sequences that result in a desired system level behavior
The developing field of synthetic biology performs quantitative studies of small subsystems of biological organisms [32, 33, 34, 35, 36, 37, 38, 39, 40, 41], using techniques from engineering science to design, construct, test, and validate physical and chemical mathematical models that relate the DNA sequence of an organism to its system level behavior It usually takes a bottom-up approach, characterizing the genetic components in a toolbox and combining them together to create
a synthetic gene network that exhibits some desired dynamical or logical behavior The components
in the toolbox must be modular and context-free; they must have well defined and understood molecular interactions whose kinetics and thermodynamics have been experimentally measured or,
at least, accurately estimated The behavior of two components must only be the sum of their molecular interactions and nothing more Consequently, we may use rational design to combine the components together to create a desired synthetic gene network In this doctoral research, our focus has been on developing computer-aided technologies to rationally design synthetic gene networks
1.4.2 Computer Aided Design of Synthetic Gene Networks
The second part of the doctoral research is to use advanced numerical methods, including the ones
we have developed, to design and construct synthetic gene networks A synthetic gene network
is a system of inter-regulated genes composed of regulatory coding DNA sequences whose genetic components have been extracted from multiple organisms The configuration of these gene networks have never existed in nature before, thus giving them the “synthetic” moniker One of our main design goals is to develop synthetic gene networks with useful dynamical behaviors, such as bistable switching, oscillations, and logical decision-making We use the numerical methods to identify the necessary genetic components that, when combined, will produce these dynamical behaviors and with certain desired characteristics, such as switch rate, oscillatory period, and response fidelity Ultimately, these numerical results must be experimentally tested to determine their accuracy Consequently, we construct simple synthetic gene networks in Escherichia coli and measure their single-cell gene expression dynamics under numerous varied environmental conditions We then compare the results of a mathematical model of the system to the experimental data Differences between the experimental and model data will suggest inaccuracies in our model By focusing on simple synthetic gene networks, we can limit these differences to a small number of degrees of free- dom When the difference only depends on a single degree of freedom, such as the kinetic or ther- modynamic constant of a specific molecular interaction, we can indirectly calculate this unknown
Trang 37CHAPTER 1 INTRODUCTION 15
parameter A combination of additional synthetic gene networks and a total sensitivity analysis can determine the accuracy of this calculation We can continue to calculate unknown kinetic or thermodynamic data by constructing additional synthetic gene networks that sensitively depend on these parameters Through this process, the genetic components in these synthetic gene networks will become well-characterized
1.4.3 Protein Devices: A New Type of Synthetic Gene Network
Many potentially useful synthetic gene networks require the expression of an engineered gene if and only if two different DNA-binding proteins exist in sufficient concentration While some nat- ural and engineered systems activate gene expression according to a logical AND-like behavior, they often utilize allosteric or cooperative proteinprotein interactions, rendering their components unsuitable for a toolbox of modular parts for use in multiple applications We develop a quantitative mathematical model to demonstrate that a small system of interacting fusion proteins, called a pro- tein device, can activate an engineered gene according to the Boolean AND behavior while using only modular protein domains and DNA sites
The fusion proteins are created from transactivating, DNA-binding, non-DNA binding, and pro- teinprotein interaction domains along with the corresponding peptide ligands These domains cre- ate a synthetic scaffold protein and multiple scaffold-binding proteins with varying activities, which may bind together to form various scaffold complexes The synthetic scaffold complex will only bind to an engineered promoter and transactivate its gene expression if its two protein inputs, which are DNA-binding proteins fused to a short peptide tag, are both present in sufficient concentration The engineered promoter contains two upstream DNA operators whose interactions with the DNA- binding proteins coordinate the positioning of the synthetic scaffold complex in the correct spatial configuration with respect to the RNA polymerase and o factor
Using a combined kinetic and thermodynamic model, we identify the characteristics of the molecular components and their rates of constitutive production that maximize the fidelity of AND behavior We specifically measure how the false positive activation rate and the threshold to maxi- mum activation varies with the kinetic and thermodynamic parameters of the system’s components Interestingly, we find that the false positive activation rate is bimodal with respect to the affinity
of the protein-protein interaction domains and their peptide ligands We also define the operating range of the AND protein device, which describes the maximum concentration of the two input proteins that results in less than the maximum false positive activation, and determine how to max- imize the operator range In addition, we measure the stochastic dynamics of gene expression in response to step changes in the production rates of the two input proteins to determine how fast the AND protein device can turn on and off gene expression Finally, we suggest additional protein devices with more complex Boolean functions, including ones with three or four protein inputs These protein devices have a number of important advantages over existing synthetic gene net- works, including their modularity, high fidelity, rapid response, and high scalability Their high scalability is perhaps the most interesting of these advantages Within the same cellular organism, the synthetic scaffold protein may be reused by additional pairs of protein inputs to regulate the expression of additional output engineered promoters Consequently, even though the first protein device requires the engineering of five fusion proteins, succeeding AND logic gates only require two additional fusion proteins, which are the two inputs Importantly, these additional inputs pro-
Trang 381.4.4 Oscillatory Synthetic Gene Networks
Oscillating gene networks produce sustained oscillations in the concentrations of proteins over long periods of time, typically much greater than the doubling time of cell division They may be used
in conjunction with other gene networks to keep track of the flow of time, which is important in regulating the circadian rhythms of flies and animals [42] and other regulated cellular processes, including metabolism A synthetic genetic oscillator was previously constructed [43] in E coli by extracting genetic components from natural bacterial systems and creating a three gene network with cyclic negative feedback loops
We became interested in determining how the characteristics of the constituent molecular com- ponents of a similar gene network affected the period of the sustained oscillations and their robust- ness to molecular noise We extracted genetic components, including repressor proteins and DNA operators, from the well-characterized Jac, tet, and ara operons and used their literature-derived kinetic and thermodynamic parameters to construct a stochastic dynamical model of a three gene network with cyclic negative feedback loops We then performed a stochastic sensitivity analysis
on the model by varying key parameters, including the number and affinity of the DNA operators, the half-lives of the repressor mRNA and proteins, and the concentrations of free RNA polymerase and ribosome, to determine their effect on the period of oscillations and the variability of the period due to molecular noise
Because the oscillations of each cellular organism are out of phase with one another, the steady- state probability distribution of a population of bacteria will not show sustained oscillations Instead,
we use the cyclic covariance function to calculate the period of the stochastic oscillatory limit cycle, assuming that the oscillating trajectories are a cyclostationary signal The cyclic covariance func- tion is the fourier transform of the autocorrelation function of the oscillating protein concentration Using it, we can also calculate the standard deviation of the period and, consequently, determine the robustness of the oscillations to molecular noise
Our conclusions determine the set of genetic components that result in sustained and robust oscillations We have determined that, for each gene, placing only a single DNA operator in an overlapping position with the promoter does not result in sustained oscillations; the promoter is inefficiently repressed so that repressor production still occurs when the previous pulse has crested, eliminating the next pulse in the cycle Instead, by using two or three overlapping DNA operators per promoter, sustained oscillations may be generated In addition, if the affinities of the DNA operators vary too much across the three genes, the amplitudes of the pulses are too disparate, causing the stochastic oscillations to be fragile and unsustained These conclusions may be directly tested by constructing an oscillating synthetic gene with the specified genetic components, including mutant ones with differing kinetic and thermodynamic characteristics
Trang 39CHAPTER I1 INTRODUCTION 17
1.4.5 Bottom-Up Mathematical Analysis of a Synthetic Promoter
We employed a combined experimental, computational, and theoretical methodology by designing, constructing, characterizing, and mathematically analyzing a synthetic promoter that only max- imally expresses the gfp reporter gene when aTC (anhydrous tetracycline) and IPTG (Isopropyl B-D-1-thiogalactopyranoside) are both added to the system in sufficiently high concentrations The promoter is a completely synthetic sequence that we designed by combining genetic components from a well-characterized toolbox of parts It is composed of one overlapping lacO1 operator, two overlapping tetO2 operators, a weak consensus -35 hexamer, a strong consensus -10 hexamer, and
a ribosome binding site sequestered by a medium strength secondary structure
With controlled exponential growth conditions, we measured the dynamic and steady-state gene expression of the synthetic promoter under sixteen different concentrations of aTC and IPTG and seven time points We then constructed a steady-state mathematical model of the system, based
on the kinetic and thermodynamic characteristics of all molecular interactions in the system, and compare the experimental data to the model results The only free parameters in the model were the Gibbs free energies describing the steric interactions between the Lac and Tet repressors and the Holoenzyme By fitting the data to the model, we calculated that the steric interaction between the Lac repressor, binding to an operator placed upstream of the -35 hexamer, and the Holoenzyme had
a Gibbs free energy of AG = 0.3 kcal/mol Consequently, even when the Lac repressor is bound
in an overlapping position with the promoter, this insufficient steric interaction, measured by the barely positive Gibbs free energy, is unable to prevent the Holoenzyme from frequently binding and initiating transcriptional initiation
The steady-state mathematical model uses the chemical partition function to describe the prob- ability of the Holoenzyme being in a “transcriptionally ready state” The remaining portions of the model are a deterministic steady-state description of transcriptional initiation, translation initiaton, degradation and dilution of RNA and protein, mRNA secondary structure formation, and the copy number of the plasmid These latter portions of the model may be lumped together into a single multiplicative factor that does not depend on the concentrations of aTC and IPTG Consequently, when comparing the experimental and model data over varying aTC and IPTG concentrations, the chemical partition function need only be considered
The chemical partition function enumerates all possible unique regulatory states of the promoter, tallies the density of microstates and constituent Gibbs free energies for each state, and then calcu- lates the probability of each regulatory state The probability of the “transcriptionally ready state”’
is the sum of the probabilities of each state in which the holoenzyme has successfully assembled on the promoter These probabilities depend on the Gibbs free energies of the protein-DNA interac- tions at the promoter, the concentrations of RNA polymerase, Lac repressor, and Tet repressor, and the concentrations of aTC and IPTG Because the /ac and tet systems are so well-characterized, the thermodynamic constants for most of their molecular interactions have been empirically measured The remaining ones, such as the steric interactions between the repressors and RNA polymerase, are position dependent, Consequently, we may use the mathematical model to calculate these missing data and determine the position dependence of the steric interaction
These experiments serve as a prototypical demonstration of a combined experimental, computa- tional, and theoretical methodology They present a first step towards a route to indirectly measure the kinetic and thermodynamic characteristics of existing or new genetic components By repeating
Trang 40CHAPTER I1 INTRODUCTION 18
this procedure, we can continue to expand the well-characterized toolbox of genetic parts Through the process, we may also design new synthetic gene networks that exhibit dynamical or logical behaviors that are useful in biotechnological, medical, and industrial applications
1.4.6 Conclusions
By using both deterministic and stochastic numerical methods as well as experimental characteri- zation and comparisons, we are well on our way towards the successful rational design of synthetic gene networks In the coming years, we will witness the expansion of an available toolbox of well- characterized genetic components whose combination will result in a wide variety of synthetic gene networks These synthetic gene networks can exhibit a variety of useful and interesting dynami- cal or logical behaviors and can be designed to improve existing biotechnological applications or developed for revolutionary new applications, such as gene therapy or whole-cell biosensors