Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 11 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
11
Dung lượng
193 KB
Nội dung
Identification of small scale biochemical networks based on general type system perturbations Henning Schmidt1, Kwang-Hyun Cho2,3 and Elling W Jacobsen1 Signals, Sensors and Systems, Royal Institute of Technology – KTH, Stockholm, Sweden College of Medicine, Seoul National University, Chongno-gu, Seoul, Korea Korea Bio-MAX Institute, Seoul National University, Gwanak-gu, Korea Keywords biochemical networks; identification; Jacobian; time-series measurements Correspondence E W Jacobsen, Department of Automatic Control, Royal Institute of Technology – KTH, Osquldasvag 10, S-10044 Stockholm, Sweden Fax: +46 8790 7329 Tel: +46 8790 7325 E-mail: jacobsen@s3.kth.se K.-H Cho, College of Medicine, Seoul National University, Chongno-gu, Seoul, 110–799, Korea, and Korea Bio-MAX Institute, Seoul National University, Gwanak-gu, Seoul, 151–818, Korea Fax: +82 2887 2692 Tel: +82 2887 2650 E-mail: ckh-sb@snu.ac.kr (Received 22 December 2004, accepted February 2005) doi:10.1111/j.1742-4658.2005.04605.x New technologies enable acquisition of large data-sets containing genomic, proteomic and metabolic information that describe the state of a cell These data-sets call for systematic methods enabling relevant information about the inner workings of the cell to be extracted One important issue at hand is the understanding of the functional interactions between genes, proteins and metabolites We here present a method for identifying the dynamic interactions between biochemical components within the cell, in the vicinity of a steady-state Key features of the proposed method are that it can deal with data obtained under perturbations of any system parameter, not only concentrations of specific components, and that the direct effect of the perturbations does not need to be known This is important as concentration perturbations are often difficult to perform in biochemical systems and the specific effects of general type perturbations are usually highly uncertain, or unknown The basis of the method is a linear leastsquares estimation, using time-series measurements of concentrations and expression profiles, in which system states and parameter perturbations are estimated simultaneously An important side-effect of also employing estimation of the parameter perturbations is that knowledge of the system’s steady-state concentrations, or activities, is not required and that deviations from steady-state prior to the perturbation can be dealt with Time derivatives are computed using a zero-order hold discretization, shown to yield significant improvements over the widely used Euler approximation We also show how network interactions with dynamics that are too fast to be captured within the available sampling time can be determined and excluded from the network identification Known and unknown moiety conservation relationships can be processed in the same manner The method requires that the number of samples equals at least the number of network components and, hence, is at present restricted to relatively smallscale networks We demonstrate herein the performance of the method on two small-scale in silico genetic networks New high-throughput experimental technologies, i.e for monitoring the expression levels of large gene sets and the concentrations of metabolites, are evolving rapidly These data sets contain the information required to uncover the organization of biological systems on a genetic, proteomic, and metabolic level However, in order to realize the translation of data FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS into a system level understanding of cell functions, methods that can construct quantitative mathematical models from data are needed In particular, determination of the quantitative interactions between the components within and across these levels is an important issue These interactions lead to the notion of networks that can be represented by weighted, 2141 Identification of biochemical network structures directed graphs, where the nodes correspond to the biochemical components and the edges, represented as arrows with weights attached, indicate the direct quantitative effect that a change in a certain component has on another component The weights are in general nonlinear functions that represent reaction kinetics Determination of these network structures will provide insight into the functional relationships between the involved components, so as to better understand the functions of biological systems, and will eventually lead to knowledge concerning how these systems can be manipulated in order to achieve a certain desired behavior Due to the fact that the reaction kinetics in general are unknown, and because of the large number of parameters involved, it is in most cases unfeasible to determine directly the nonlinear weights from experimental data Herein, however, a distinction has to be made between gene and metabolic networks For metabolic networks, a good initial guess of the network structure is usually available from databases, such as KEGG [1], while the structures of gene networks usually are largely unknown in advance Therefore, the approach presented in this paper probably has its greatest value for gene networks, but can be applied equally well to signaling and metabolic networks where, e.g model validation and the determination of new, previously unknown, connections between intermediates is needed A common approach in structural identification is to consider the biochemical network behavior around some steady-state and assume that it behaves linearly for small deviations from this steady-state [2–4] With this assumption, the network weights become constants, quantifying the interactions between the components in the neighborhood of the steady-state Grouping these constant weights into a matrix yields an interaction matrix, the Jacobian, which quantifies the mutual effects of deviations from the steady-state on the various components of the system Several approaches to the determination of interaction matrices of biochemical systems have been published recently These can be divided roughly into methods focusing on the determination of the qualitative structure of the interactions and those aimed at determining quantitative information about the interactions Ross [5] reviews two approaches to determine the structure of reaction pathways from time-series measurements of metabolites and proteins The first approach is based on small pulses of concentration changes applied to the different species around a stable steady-state Depending on the relative behavior of the measured responses, the considered metabolic pathway 2142 H Schmidt et al can be determined [6,7] The second approach is based on correlations between different species when periodically forcing the system by changing some input species over time Using correlation and multidimensional scaling analysis, the structure of the considered pathway can be unravelled [8] Kholodenko et al., Gardner et al and Vance et al propose methods for determining quantitative interaction matrices based on steady-state responses of perturbed genetic networks [2,3,9] As the responses to the applied perturbations can often become relatively large in steady-state, these methods are potentially limited depending on the nonlinearity of the considered systems Furthermore, the fact that Kholodenko et al and Vance et al determine the n2 elements of the interaction matrix from n2 measurements, suggests that the results are potentially sensitive to measurement uncertainty [2,9] In contrast to methods based on steady-state measurements, methods based on time-series measurements can cope better with the issue of nonlinearity and measurement uncertainty Monitoring time-series also enables significantly more information to be extracted in each experiment A widespread method in the identification of reaction networks using time-series measurements is a leastsquares estimation of the Jacobian An interesting method is presented by Mihaliuk et al., in which the idea is to apply perturbations to all components in the network and to determine the Jacobian by measuring only one component, or a linear combination of components [10] A drawback of this method, for the application to biological systems, is the fact that the perturbations are assumed to be instant changes of the concentrations of intermediates in the network Furthermore, the magnitude of these perturbations is assumed to be known The use of concentration shift experiments, that is, adding specific components to the system as pulses or steps, is a typical assumption in many previously proposed methods However, while such perturbations are mainly feasible in chemical systems, they are usually hard to realize in vitro or in vivo [11] To overcome the restriction to concentration shift experiments, Sontag et al derive a method based on parameter perturbations, in which a separate experiment is performed for each network component so that the perturbation has no direct effect on this component, that is, the designed perturbation only works indirectly through other components in the network [4] However, this requires substantial a priori structural knowledge, and furthermore causes problems with rank deficient measurement matrices (The latter is discussed in more detail in the supplementary material.) FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS H Schmidt et al Herein, we study biochemical networks involving genes, proteins, and ⁄ or metabolites, and consider determination of the Jacobian using least-squares estimation from time-series measurements obtained in the vicinity of some steady-state The method is able to deal with very general types of system perturbations In this paper we assume the use of constant parameter perturbations, such as gene knockouts and inhibitor additions For these type of perturbations, the exact size as well as the direct effect of the perturbations will in general be largely unknown We therefore also consider incorporating a determination of the perturbation itself from the available data However, the method can be applied equally when pulse perturbations are realizable for a given network, and in the case of known or unknown time-varying parameter perturbations Furthermore, it is possible to combine pulse and parameter perturbations (The use of the method for other types of perturbations is discussed in the supplementary material.) Furthermore, we show that the effect of unsteadystate initial conditions can be considered an unknown perturbation and hence can be estimated in the same manner Due to the latter feature, the proposed method does not, in contrast to most other methods, require the system to be in a steady-state when the perturbations are applied, nor does it require knowledge of the steady-state activities and concentrations However, as the method is based on the assumption that the network is behaving linearly around the same steady-state for all experiments, the initial states of all experiments should, in general, not be too far from the steady-state at which the Jacobian is to be determined Network modelling based on time-series data requires estimation of time derivatives of the states These are commonly calculated through the use of some Euler type finite difference approximation Herein we employ a representation of time derivatives, commonly used in systems theory, that avoids any approximations, thereby leading to significantly improved estimation results Finally, we address the issue of using dynamics that are significantly faster than the sampling time, and show how such interactions can be identified and extracted from the data-sets prior to the network identification Thus, a reduced network, with the fast dynamics replaced by algebraic relationships, can be identified As we show, the same approach is also applicable in the case of moiety conservations in metabolic and signaling networks, and thus it is possible to determine the Jacobian expressed only in terms of the independent intermediates of the network FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS Identification of biochemical network structures The proposed method will in general uncover only a phenomenological interaction topology of the network as not all intermediates can be measured That is, we assume that only the measured components are part of the network to be modelled This is a common assumption often used [4] This assumption is relaxed somewhat by Mihaliuk et al [10] However, they assume that all components are known and are possible to perturb The case of unknown and unmeasurable components is, of course, a highly relevant topic, but outside the scope of this paper The outline of the paper is as follows We first present the problem formulation and briefly outline the method used for network identification from measurement samples The proposed method is then applied to in silico models of two small scale gene networks Following the conclusions, we present a detailed description of the method for least-squares identification of the Jacobian, and discuss the impact of the sampling time and moiety conservations Results and Discussion Problem formulation We consider metabolic reactions, signaling networks, and gene networks that can be described by a system of nonlinear differential equations of the form in Eqn (1) _ xtị ẳ f xtị; pị; 1ị T where, x ẳ [x1,,xn] is the state vector containing the concentrations, activities, or expressions, of all components in the network and p ¼ [p1,…,pq]T is a vector of adjustable parameters within the considered biological system, such as kinetic rate constants and genes whose expression levels can be perturbed The vector valued function f determines the dynamics of the biochemical network given the states and parameters The definition in Eqn (1) also incorporates the _ typical form of kinetic models, that is, s ¼ NV(s,p), [12] In cases of small molecular concentrations and ⁄ or low levels of diffusion, partial differential and stochastic equations may be required, but this is outside the scope of this paper Due to largely unknown reaction kinetics, and the large number of involved parameters, it is in general unfeasible to determine the nonlinear functions fi(x,p) using a ‘top-down’ approach, that is, determining all reaction mechanisms and involved parameters, such as rate constants, from measured responses of the perturbed network We therefore consider the system (Eqn 1) in the neighborhood of some steady-state 2143 Identification of biochemical network structures H Schmidt et al (x0,p0) and assume that it behaves linearly for small variations around this state This assumption allows us to represent the system as a linear time invariant system (Eqn 2) _ Dxtị ẳ @f =@xjx0 ;p0 Dxtị ỵ @f =@pjx0 ;p0 Dptị ẳ ADxtị ỵ BDptị; 2ị where Dx(t) ẳ x(t) ) x0 and Dp(t) ¼ p(t) ) p0 denote deviations from the considered steady-state Equation (2) is obtained by truncating the Taylor expansion of Eqn (1) after the linear terms The constant matrix A is the Jacobian matrix of the nonlinear system and represents the network connectivity and the interactions between the network components around the considered steady-state For example, in the case of gene networks, a zero element Aij indicates that the expression level of gene j does not directly affect the expression of gene i Positive and negative elements within A imply activation and inhibition, respectively, of the corresponding components The aim here is to determine the Jacobian, or interaction matrix A, based on time-series measurements We assume that the measurements are collected using a fixed sampling time DT, and that at each sample the concentrations, or activity levels, of all n components in x are measured Furthermore we assume that the perturbations are constant between two sampling instants Due to the discrete nature of the measurements, we reformulate the continuous time system (Eqn 2) as a discrete time system (Eqn 3) Dxkỵ1 ẳ Ad Dxk ỵ Bd Dpk ; 3ị where Dxk ẳ Dx(kDT) and Dpk ¼ Dp(kDT) Using Eqn (3) we will, in the following, show how an estimation Ad for the discrete time Jacobian Ad can be determined An estimation A for the continuous time Jacobian A can then be calculated through a reverse transformation to continuous time using the Euler approximation or the, so called, zero-order hold discretization The commonly used Euler approximation for the time derivatives of the states implies replacing the con_ tinuous derivatives by the finite difference Dx(t) ¼ (Dxk+1 ) Dxk) ⁄ DT The reverse transformation from discrete time Ad then yields the following approximation for the continuous time Jacobian (Eqn 4) Aeuler ẳ Ad Iị: DT ð4Þ The Euler discretization method is approximate, and the goodness of the approximation is in general highly sensitive to the choice of the sampling time DT This 2144 ‘approximate’ relationship between the continuous and discrete time models can be avoided completely under the assumption that the perturbations Dp are constant between sampling instants Then, an analytical solution for Dx(t) can be derived and hence also the exact relationship between Dxk+1 and Dxk This leads to the zero-order hold discretization [13] (Eqn 5) Azoh ẳ logm Ad ị; DT 5ị where logm(Ad) denotes the matrix logarithm Note there are no approximations involved in this transformation provided the parameter perturbations are constant between samples (A more detailed discussion of the zero-order hold discretization and a comparison to the Euler discretization can be found in part of the supplementary material.) Having determined an estimation Ad for Ad, an estimation for the continuous time Jacobian A can be obtained using the above transformations We will demonstrate that Eqn (5), in general, leads to a significantly better estimation of the Jacobian than Eqn (4) In-Silico four gene network example We consider a genetic network containing four genes, which has been used previously ([2,4]) as a test case for identification of interaction matrices and Jacobians The motivation behind choosing such a small scale network for illustration of the method is to keep the exposition complete and reasonably compact (The model equations and parameters are given in part of the supplementary material.) The nominal Jacobian at the considered steady-state is given by Eqn (6) À6:45 À2:92 2:54 À8:17 3:93 7 6ị Aẳ6 2:31 2:80 14:46 0 10:22 À9:74 and the corresponding network is illustrated in Fig From system identification theory it is well known that a good estimation result requires a sufficient excitation of the system (In particular, part of the supplementary material, shows that perturbations have to be chosen such that the complete space of the network states is perturbed.) In the following we consider time-series data obtained from constant parameter perturbation experiments The perturbed parameters correspond to the maximal enzyme rates involved in the transcription of the genes (part of the supplementary material) Furthermore, the magnitudes, as well as the direct effects of the perturbations, are assumed to be FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS H Schmidt et al Identification of biochemical network structures Fig Structure of the four-gene network The interconnections represent the direct interactions between the genes An arrow indicates a positive effect on the gene transcription, and a bar indicates a negative (inhibitory) effect unknown and thus not used in the identification ^ algorithm The estimations of Ad are obtained using absolute measurements and applying Eqn 15 Unless otherwise stated, the zero-order hold discretization is used in the following Estimation of the Jacobian We first performed an in silico experiment in which the maximal enzyme rate corresponding to the transcription of gene number one is perturbed by 1% The sampling time is chosen as DT ¼ 0.01 h, and we collect six samples, the minimal number required for estimating the Jacobian when the size of the perturbation is unknown The first sample is taken one time-step after the perturbation has been applied to the system It should be noted that the sampling time is chosen sufficiently small to enable the fastest dynamics of the system to be captured Applying the method proposed above, we obtain the following estimate for the Jacobian: À6:45 À2:90 0:01 2:52 0:00 À8:17 0:00 3:93 7 ^ A ¼6 À2:31 2:77 À14:40 0:01 0:00 À0:09 10:22 À9:77 Except for the (4,2) element, the estimated Jacobian is very close to the nominal Jacobian, the largest relative error in the nonzero elements being less than 1% This is not surprising, as the perturbation to the system was chosen so small that the nonlinearity of the system played a relatively modest role The fact that the (4,2) element is relatively poorly estimated is probably explained by more severe nonlinear effects for this specific relationship with the chosen parameter perturbation Note that the nonlinear effects in general will depend on the parameter chosen for perturbation FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS In real experiments, inhibition efficiencies by small interfering RNA (siRNA) or chemical inhibitors are much higher than in the experiment above A more realistic experimental setting is to assume perturbations of 50%, and to several experiments, in which different parameters are perturbed We performed four experiments and combine the obtained measurements In each experiment, one of the maximal enzyme rates of the four genes is perturbed by 50%, and the minimum required number of samples are taken – three samples in each experiment The sampling time is DT ¼ 0.01 h The result of the estimation is given by: À6:45 À2:69 À0:01 2:59 0:00 À8:14 À0:02 4:06 7 ^ A¼6 À2:19 2:90 À14:38 À0:01 0:07 0:08 8:51 À9:72 Comparing with the ‘true’ linear Jacobian in Eqn (6), we see that the network has been identified with reasonable accuracy, the largest relative error in the nonzero elements being less than 20% That we obtain such good results even for relatively large perturbations is partly explained by the fact that measurements from different experiments have been combined This allows for different perturbations of the system and a reduction in the number of samples required in each experiment The use of different perturbed parameters in each experiment leads to a better excitation of the system, which is beneficial for the estimation result The reduction in the time span of each experiment reduces the deviation from the initial state, thereby reducing the effects of nonlinearities The above results demonstrate that it is theoretically possible to determine the Jacobian from one experiment only, but that in practice usually more than one experiment will be preferable How to choose the perturbations in an optimal way is out of the scope of this paper and a topic for future work Instead we will, in the experiments below, consistently perform four experiments (In each experiment the transcription rate of a different gene is perturbed using the parameters given in part of the supplementary materials.) Effect of discretization method In order to illustrate the importance of the method employed for determination of time derivatives, we herein perform estimations for different sampling times DT and perturbation magnitudes, using two discretization methods (Eqns and 5) The relative estimation error, e, is calculated as (Eqn 7) 2145 Identification of biochemical network structures aij ¼ n n XX jaij j N i¼1 j¼1 (^ Aij ÀAij Aij 6¼ Aij ; 0; 7ị ; Ai;j ẳ where N denotes the number of nonzero elements in the nominal Jacobian A The results are shown in Table In the table, we also show the error introduced in the derivatives by using the Euler approximation The error is determined using the nominal Jacobian A and is computed as gDTị ẳ jjeADT I þ ADTÞjjsum : jjeADT jjsum The results clearly demonstrate that the zero-order hold discretization in Eqn (5) leads to a considerable improvement in the network identification, compared to the commonly used Euler approximation Impact of measurement uncertainty We consider herein the effect of measurement uncertainty on the estimation of the Jacobian The uncertainty is simulated in silico by adding noise to the absolute measurements xk as follows noise xk ¼ xk ỵ W x0 Here, W denotes a diagonal matrix in which the entries are uniformly distributed random variables between )0.02 and 0.02 These values may appear small compared to the uncertainty in realistic biological experiments However, the noise levels relative to the measured deviations Dx correspond to over 50% for some samples This should also be seen in relation to the fact that measurements of gene expressions are Table Comparison of estimation errors We compared the estimation errors obtained for different sampling times, discretization methods, and magnitudes of the parameter perturbation (Euler), the Euler approximation; (ZOH), a zero-order hold discretization The last column displays the relative error introduced by using the Euler approximation Error (%) 50% perturbation 10% perturbation Approximation error DT e (ZOH) e (Euler) e (ZOH) e (Euler) g(DT) 0.001 0.01 0.1 0.48 3.95 22.77 1.02 9.48 56.98 0.10 0.84 6.22 0.83 7.92 52.82 0.013 1.3 120 2146 often carried out in a relative manner, corresponding to the measurement of Dx The sampling time is DT ¼ 0.01 h as before, and considered magnitudes of parameter perturbations are 20, 50 and 100% The results for different numbers of measured time-steps per experiment can be seen in Fig In order to display the mean value and the standard deviation of the relative estimation error in the nonzero elements of the Jacobian, one hundred Monte-Carlo simulations have been conducted at each point The results show that the relative estimation error (Eqn 7) and its standard deviation decrease for increasing numbers of measured time-steps It is interesting to note that the estimation error also decreases for increasing perturbation magnitudes This is explained by the fact that the signal-to-noise ratio becomes more improved for larger perturbations, which is reasonable also in practice This serves to illustrate that in general, there will exist a trade-off, in terms of effects of measurement uncertainty on the one hand and the effects of nonlinearities on the other hand, when choosing the size of parameter perturbations Impact of sampling time In order to illustrate the problems occurring in networks with dynamic modes that are too fast to capture with the available sampling time, we considered identification of a network consisting of five genes The network is a modification of the four gene network used in the previous example, obtained by adding a fifth 10 Mean relative error and standard deviation (%) e¼ H Schmidt et al 10 20% perturbation 50% perturbation 10 100% perturbation 10 21% Error 12 15 18 Measured time-steps / experiment 21 Fig Mean value and standard deviation of the relative estimation error (7) in the nonzero elements of the Jacobian, obtained from 100 Monte-Carlo simulations FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS H Schmidt et al Identification of biochemical network structures Fig Structure of the five-gene network The interconnections determine the direct interactions between the genes An arrow indicates a positive effect on the gene transcription, and a bar indicates a negative effect gene with relatively fast dynamics (The equations, parameters, and the nominal Jacobian are given in part seven of the supplementary material.) The structure of the five gene network is shown in Fig In the considered network, the degradation rate of the mRNA of gene five has been chosen to be much faster than the degradation rates for the other mRNAs, thereby introducing a relatively fast dynamic mode The sampling time we employ is too large to capture this fast dynamic mode Data for the estimation of the Jacobian of the system is generated in silico in the following way: (a) five experiments, in each a 50% repression of one of the genes is simulated In in silico implementations, this corresponds to a parameter perturbation of )50% in the maximal enzyme rate We stress that the magnitude of the perturbation is assumed unknown when we apply the identification algorithm; (b) in each experiment the mRNA concentrations, corresponding to all five genes, are measured at four consecutive time-steps The first sample is taken one time-step after the perturbation is applied to the system; (c) the perturbation is applied while the system is not in the steady-state In in silico environments, this is simulated by introducing the perturbation while all mRNA concentrations are 5% below their steady-state values This reflects the fact that a biological system in general will not be in a steady-state when perturbations are applied in a real experiment Furthermore, the steady-state is assumed to be unknown and thus not used in the identification; (d) the sampling time is chosen to be DT ¼ 0.01 h Following the approach discussed above, we collect the measurements and find that the smallest singular value of the measurement matrix M is r1 ¼ 0.00026, which is relatively close to compared to the other singular values Thus, we conclude that the chosen FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS sampling time was too large with respect to the fastest dynamics of the system Using the zero-order hold discretization to determine A from Ad , ignoring the fact that some modes have not been captured in the data, the following result is obtained: À5:27 À4:29 À0:03 2:49 À5:57 0:68 À8:58 0:02 3:76 À3:33 7 A ¼ 1:19 2:27 À14:32 0:02 À9:48 7: À19:32 15:60 9:50 À9:72 92:46 30:52 À25:11 0:03 0:07 À149:4 As can be verified easily, this result does not capture the structure of the network in Fig correctly For example, the estimate of the Jacobian shows a large direct effect of gene on gene 4, which is incorrect The singular vector u1, corresponding to r1, shows that the fifth component in x, that is, the mRNA concentration corresponding to gene five, is the most dominant with respect to the singularity of the measurement matrix Using the approach outlined in the Method section, neglecting the measurements of the fifth component, the following Jacobian for the reduced network is obtained: À6:43 À3:34 À0:01 2:49 À0:02 À8:01 0:03 3:76 7 A1;2;3;4 ¼ À0:78 3:89 À14:28 0:02 À0:04 À0:17 9:12 À9:72 The identified Jacobian is close to the true Jacobian for the reduced network, with relative errors in all nonzero elements being smaller than 20%, and all zero elements being identified as close to zero It is important to point out that the Jacobian of the reduced network is not supposed to be equal to the Jacobian of the four gene network in the previous example The dynamics of the reduced Jacobian also correspond reasonably well to the slow dynamics of the five gene network, as can be seen from the computed eigenvalues in Table The structure of the identified reduced Jacobian reflects well the structure of the network in Fig when gene five is taken out For instance, gene one directly affects gene three when the dynamics of gene five are neglected, or assumed to be infinitely fast The results presented above show that it is indeed possible to obtain a useful identification result even in the case that fast dynamics are not captured correctly Moreover, one can obtain the information on which components are involved in the fast reactions, and their static relationship with the other components of the network 2147 Identification of biochemical network structures H Schmidt et al Table Comparison between the eigenvalues of the nominal Jacobian of the five gene Anetwork and the eigenvalues of the estipffiffiffiffiffiffiffi ^ mated reduced Jacobian A1;2;3;4 i, À1 Network Eigenvalues A (nominal) ^ A1;2;3;4 (estimated) )571.7 None )13.28 ± i 3.16 )13.27 ± i 3.36 )6.93 )7.12 )5.15 )4.80 Conclusions In this paper we have discussed the qualitative and quantitative identification of network interactions based on time-series measurements obtained from perturbation experiments and least-squares estimation The proposed method is equally applicable to identification of gene, protein, and metabolic networks Due to the fact that the method requires at least n +1 samples, where n is the number of network components, the method is relatively costly for large scale networks, and thus so far limited to the identification of smaller networks However, as high throughput techniques are evolving fast, it is probable that high-frequency sampling can be obtained in the near future Thus, wet-lab based experimental verification of the proposed method remains as future study The proposed approach has several advantages over other approaches: the steady-state of the system does not need to be known nor achieved prior to the perturbation; general type perturbations can be used; dynamics relatively fast compared to the sampling time can be detected and removed from the identification; linear dependencies due to moiety conservations can be identified and processed; samples from any number of experiments can be combined in the identification, as long as these experiments have been carried out around the same steady-state We have shown that measurement uncertainty can have a large effect on the identification result Possible solutions for uncertainty and noise are to collect and use more measurement data, and to make use of available a priori structural knowledge In addition, methods from identification theory on estimating and filtering noise can be incorporated Furthermore, the signalto-noise ratio can be increased by choosing larger perturbations However, the latter can lead to increased nonlinear effects and a trade-off between the two effects, therefore, has to be taken into consideration Instead of using the widely accepted Euler discretization, we have shown that the zero-order hold discretization, in general, results in a significantly improved estimation and should be used in all methods aimed at identifying dynamic biochemical networks 2148 We have not discussed explicitly the effect of autoregulation of biological systems by self-negative feedback For example, certain components might be regulated by homeostatic effects and a response to perturbations might not be visible in the measurement data However, under the assumption that these effects are significantly slower than the sampling time it is reasonable to assume that the proposed method will lead to an acceptable result Furthermore, we have only considered the case of the estimation around a stable steady-state of the network In the case of oscillations, created within the network or affecting the network, one would have to deal with time-varying Jacobians, which is outside the scope of this paper Experimental procedures Method In this section, we present a method for the determination ^ of an estimate Ad of the discrete Jacobian Ad based on minimization of a least-squares criterion Some related issues, such as the choice of the sampling time and how to deal with moiety conservations in metabolic and signaling networks is also discussed Least-squares based estimation is used widely within many areas of science and engineering, an important reason being that it is applicable even in the case where no statistical information about the measurements are available [14]; this is typically the case with measurement data from biological systems Excitation of a biochemical system is usually performed as a constant parameter perturbation e.g gene knockouts or the alteration of gene transcription rates Especially in vivo, it is not possible to quantify the applied perturbations, meaning that the magnitude of the applied perturbations is unknown Furthermore, for gene networks, it is usually also unknown which components the perturbations affect in a direct manner Previously proposed methods often assume this information to be available, at least partially Sontag et al., for example, assume that the magnitude of the perturbations is unknown but the genes that are directly affected by the perturbed parameters are known [4] In the following we consider both the magnitude and the direct effects of perturbations to be unknown To keep the exposition relatively simple, we assume, however, that the parameter perturbations are constant over time (However, in part of the supplementary material we show how this assumption can be relaxed to take time varying parameter perturbations, known or unknown, and pulse perturbations into account.) We assume that the network response to the applied perturbations is sufficiently small such that, in the time range of the measurements, the system can be regarded as FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS H Schmidt et al Identification of biochemical network structures linear Equation (3) then describes the behavior of the system (Eqn 1) for variations around the steady-state (x0,p0) In the following we will replace the time-dependent perturbation vector Dpk by a constant perturbation vector Dp As Bd and Dp are both unknown, we can replace the corresponding term by a constant unknown perturbation Du, as follows: Dxk : 8ị Dxkỵ1 ẳ Ad Dxk ỵ Bd Dp ẳ Ad Dxk ỵ Du ẳ ẵAd ; Du Here, [Ad, Du] is a matrix, consisting of the discrete time Jacobian and the unknown perturbation vector Du, representing the unknowns in the equation The vectors Dxk+1 T and ½Dxk ; 1T are given by the measurements Assume now that the system (Eqn 1) at time k ¼ k0 is in steady-state (Dx0 ¼ 0) and that an unknown perturbation, corresponding to a nonzero perturbation vector Du, is applied to the system at k ¼ k0 and held constant Without loss of generality we can assume herein k0 ¼ The response of the network to this perturbation is measured at the following time-steps The column vector Dxk represents the concentrations of the network components relative to the steady-state concentrations obtained at time step k > Measuring the response of the network until time-step n+2, where n corresponds to the number of involved components in the network, and arranging these concentration vectors into matrices we obtain the following matrix version of Eqn (8): R ẳ ẵDxnỵ2 ; Dxn ; :::; Dx2 Dxnỵ1 Dxn ẳ ẵAd ; Du 1 ¼ ½Ad ; DuM: ::: ::: Dx1 ^ ẵAd ; D^ ẳ RM T MM T ị1 ; u 9ị ẳ RM : Invertibility of M can be guaranteed under a controllability condition from linear systems theory (see proof in part of the supplementary material), and Ad and the unknown perturbation Du can then be determined from a single experiment The determination of Ad and Du is exact only in the case where the system is linear and no measurement uncertainty is present In the case of noisy measurements and a nonlin^ u ear biochemical network, only estimates Ad and D^ of the unknowns can be obtained It is then also important to measure and use more time-steps than the minimum required In the case of more than n + measured time- ð10Þ ^ u this corresponds to a least-squares solution for Ad and D^ As the identification of the overall network structure requires a relatively large number of measurement samples, we consider combining data from several experiments It has to be pointed out that these experiments should be performed around the same steady-state, as only then an averaging effect in the determination of Ad can be avoided Small variations of the initial state around the steady-state are admissible, as long as the system still can be seen as behaving linearly If r experiments are performed, the result matrix R can be constructed as Eqn (11): R ẳ ẵR1 ; :::; Rr ; 11ị where Ri is the result matrix corresponding to the i-th experiment For each experiment, Ri is constructed as Eqn (12): i i i Ri ẳ ẵDxmi ; Dxmi ; :::; Dx2 ; ð12Þ where mi determines the number of measured time-steps in experiment i Note that the measurements are assumed to start at time k ¼ and end at time k ¼ mi The measurement matrix M is constructed as Eqn (13): Under the assumption that the (n +1) · (n +1) measurement matrix M on the right hand side in (Eqn 9) can be inverted, the unknown matrix [Ad, Du] can be determined from: Dxnỵ1 Dxn ::: Dx1 ẵAd ; Du ẳ ẵDxnỵ2 ; Dxn ; :::; Dx2 1 ::: FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS steps, the matrices M and R are constructed as above, but with more columns, corresponding to the measurements in the additional time-steps Thus, M will no longer be a square matrix and the pseudoinverse needs to be used instead: M1 6 M¼6 M2 ::: Mr ::: 7 ::: 7; ::: ð13Þ where Mi is the matrix containing the measurements corresponding to the i-th experiment, shifted by one time-step relative to the measurements in Ri For each experiment Mi is constructed as Eqn (14): i i i Mi ¼ ½Dxmi À1 ; Dxmi À2 ; :::; Dx1 : ð14Þ The and elements in Eqn (13) denote row vectors with unity and zero entries, respectively These vectors have the same width as the corresponding measurement matrices Mi The 1-vectors have the same origin as those in Eqn (9) However, in the case of several experiments, one has to take into account that the perturbation Dui in the i-th experiment can be different from the perturbation in the other experiments, and thus for each experiment one perturbation vector needs to be taken into account (The construction of the matrices M and R is illustrated for a simple example in part of the supplementary material.) 2149 Identification of biochemical network structures H Schmidt et al Estimations for the discrete time Jacobian Ad and the unknown perturbations Du1,…, Dur can now be determined from: ^ ½Ad ; D^1 ; :::; D^r ẳ RM T MM T ị1 u u ð15Þ In the case of combined experiments, the total number of columns of M should at least equal n + r For the construction of R and M at least n + 2r measured time-steps are required Note that Eqn (15) involves the pseudoinverse of M, corresponding to a least-squares estimation of Ad and Dui An important side effect of incorporating estimation of the applied perturbations using measurement data, is that also nonzero, or unsteady-state, initial conditions can be handled This follows from the fact that initial unknown deviations from the steady-state in fact can be represented as an unknown perturbation Thus, the proposed method can be used even in cases where the steady-state x0 is unknown To see this, Eqn (8) is reformulated using the relations Dxk+1 ¼ xk+1 ) x0 and Dxk ¼ xk – x0 to obtain: xkỵ1 ẳ Ad xk ỵ u; where u is now given by u ¼ Du + (I ) Ad)x0, representing a lumped perturbation, consisting of the unknown steadystate and the unknown perturbation Du Thus, rather than using relative measurements Dxk, the absolute measurements xk can be used directly for the estimation, and the steady-state of the system does not need to be known In order to use this approach, it is sufficient to replace the i i Dxk ¼ xk À x0 in Eqns (12) and (14) by the corresponding i absolute measurements xk Equation (15) then becomes ^ ^ ^ ẵAd ; u1 ;:::; ur ẳ RM T ðMM T ÞÀ1 ; where the only difference lies in the fact that now the lumped perturbations ui, instead of Dui, are estimated Note, however, that the method is still based on the assumption that the network is behaving linearly around the same steady-state for all experiments Hence, the initial states in all experiments should in general not be too far from the steady-state at which the Jacobian is to be determined The advantage of the approach proposed above is that very general types of perturbations can be applied to the system, and that information about the perturbations is not required (As mentioned earlier in the text, in part of the supplementary material we relax the assumption of constant parameter perturbations.) Choice of sampling time and dealing with moiety conservations Biochemical networks generally contain dynamic modes with a wide range of time constants In order to identify the full Jacobian from time-series measurements, the 2150 sampling time DT needs to be chosen so small that even the fastest dynamics are captured Due to experimental limitations, it may, however, not be possible to realize the required sampling time Furthermore, as the dynamics of the system in general are unknown in advance, it is hard to determine the required sampling time in advance Herein we will consider how interactions with dynamics significantly faster than the sampling time can be identified a priori from the collected data, and how these interactions then can be extracted from the data prior to identification of the network Jacobian We also show that the same approach can be used to deal with moiety conservations within the considered network Assume the fastest mode of the linearized system (Eqn 2) corresponds to an eigenvalue sf or a time-constant sf ¼ ⁄ |kf | In order to obtain a reasonable estimate of the corresponding dynamics, a sampling time DT smaller than sf should be used [15] If the sampling time is chosen to be significantly larger than sf, then the transients of this mode will essentially disappear between samples This implies that there exists an almost linear dependency between the measurements of the sampled states, and hence that the measurement matrix M will be (almost) rank deficient In general – and we assume the perturbations fulfil the controllability condition discussed above ) the deficiency will be equal to the number of modes with time-constant significantly smaller than the sampling time The linear dependency, corresponding to the interactions with dynamics significantly faster than the sampling time, can be determined directly from the collected measurements using a singular value decomposition (SVD) of the measurement matrix, that is, M ¼ USVH The vectors ui corresponding to singular values ri close to zero will correspond to the singular directions A possible solution to the problem with too slow sampling is to identify the components taking part in the fast dynamics, that is, components corresponding to nonzero elements in the singular vector ui, and to remove one of them for each fast mode Any component can in principle be chosen, but a reasonable choice is to neglect the one being most dominant with respect to the singularity, that is, corresponding to the largest element in the vector ui Repeating this procedure for every singular value of M close to zero, will lead to a measurement matrix with full rank, allowing determination of the Jacobian of the network, reduced by one component for each fast mode The presence of moiety conservations in metabolic or signaling networks has the same effect on the estimation of the Jacobian Ad as modes with dynamics significantly faster than the sampling time In other words, some of the concentrations of the intermediates in the networks will be linearly dependent, resulting in a measurement matrix without full row rank Thus, the same approach as presented above for dealing with linear dependencies due to a too large sampling time, can be used to determine the components involved in FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS H Schmidt et al the moiety conservations, and a reduced Jacobian can then be determined for a reduced network containing only the independent components The algebraic relations corresponding to moiety conservations are obtained directly from the SVD of the measurement matrix, M Note that if the linear dependencies due to moiety conservations and ⁄ or fast dynamic modes are not eliminated ^ ^ from M prior to the determination of Ad and A, the resulting Jacobian will contain gross errors Acknowledgments Henning Schmidt and Elling W Jacobsen acknowledge financial support from the Swedish Research Council Kwang-Hyun Cho acknowledges the support received by a grant from the Korea Ministry of Science and Technology (Korean Systems Biology Research Grant, M10309000006–03B5000-00211) and also by The 21C Frontier Microbial Genomics and Application Center Program, Ministry of Science and Technology (Grant M605-0204-3-0), Republic of Korea References Kanehisa M & Goto S (2000) KEGG: Kyoto encyclopedia of genes and genomes Nucleic Acids Res 28, 27–30 http://www.genome.jp/kegg/ Kholodenko BN, Kiyatkin A, Bruggeman FJ, Sontag E, Westerhoff HV & Hoek JB (2002) Untangling the wires: a strategy to trace functional interactions in signaling and gene networks Proc Natl Acad Sci USA 99, 12841– 12846 Gardner T, di Bernardo D, Lorenz D & Collins J (2003) Inferring genetic networks and identifying compound mode of action via expression profiling Science 301, 102–105 Sontag E, Kiyatkin A & Kholodenko B (2004) Inferring dynamic architecture of cellular networks using time series of gene expression, protein and metabolite data Bioinformatics 20, 1877–1886 FEBS Journal 272 (2005) 2141–2151 ª 2005 FEBS Identification of biochemical network structures Ross J (2003) New approaches to the deduction of complex reaction mechanisms Acc Chem Res 36, 839–847 Vance W, Arkin A & Ross J (2002) Determination of causal connectivities of species in reaction networks Proc Natl Acad Sci USA 99, 5816–5821 Torralba A, Yu K, Shen P, Oefner P & Ross J (2003) Experimental test of a method for determining causal connectivities of species in reactions Proc Natl Acad Sci USA 100, 1494–1498 Arkin A & Ross J (1995) Statistical construction of chemical reaction mechanisms from measured time-series J Phys Chem 99, 970–979 de la Fuente A, Brazhnik P & Mendes P (2002) Linking the genes: inferring quantitative gene networks from microarray data Trends Genet 18, 395–398 10 Mihaliuk E, Skodt H, Hynne F, Sorensen PG & Showalter K (1999) Normal modes for chemical reactions from time series analysis J Phys Chem 103, 8246–8251 11 Crampin EJ, Schnell S & McSharry PE (2004) Mathematical and computational techniques to deduce complex biochemical reaction mechanisms Prog Biophys Mol Biol 86, 77–112 12 Siddhartha J & van Schuppen JH (2001) Modelling and control of cell reaction networks PNA-R0116, CWI, Amsterdam 13 Rugh W (1996) Linear System Theory Prentice Hall, Upper Saddle River, NJ, USA 14 Kay S (1993) Fundamentals of Statistical Signal Processing Prentice Hall, Upper Saddle River, NJ, USA 15 Ljung L (1999) System Identification – Theory for the User, 2nd edn Prentice Hall, Upper Saddle River, NJ, USA Supplementary material The following material is available from http://www blackwellpublishing.com/products/journals/suppmat/EJB/ EJB4605/EJB4605sm.htm Appendix S1 Additional proofs and models 2151 ... the mutual effects of deviations from the steady-state on the various components of the system Several approaches to the determination of interaction matrices of biochemical systems have been... determination of the Jacobian of the network, reduced by one component for each fast mode The presence of moiety conservations in metabolic or signaling networks has the same effect on the estimation of. .. Results and Discussion Problem formulation We consider metabolic reactions, signaling networks, and gene networks that can be described by a system of nonlinear differential equations of the form in