Báo cáo y học: "Statistical tools for synthesizing lists of differentially expressed features in related experiments" docx

17 171 0
Báo cáo y học: "Statistical tools for synthesizing lists of differentially expressed features in related experiments" docx

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

Genome Biology 2007, 8:R54 comment reviews reports deposited research refereed research interactions information Open Access 2007Blangiardo and RichardsonVolume 8, Issue 4, Article R54 Method Statistical tools for synthesizing lists of differentially expressed features in related experiments Marta Blangiardo and Sylvia Richardson Address: Centre for Biostatistics, Imperial College, St Mary's Campus, Norfolk Place, London W2 1PG, UK. Correspondence: Marta Blangiardo. Email: m.blangiardo@imperial.ac.uk © 2007 Blangiardo and Richardson; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Synthesizing results from related experiments<p>A novel approach for finding a list of features that are commonly perturbed in two or more experiments, quantifying the evidence of dependence between the experiments by a ratio.</p> Abstract We propose a novel approach for finding a list of features that are commonly perturbed in two or more experiments, quantifying the evidence of dependence between the experiments by a ratio. We present a Bayesian analysis of this ratio, which leads us to suggest two rules for choosing a cut- off on the ranked list of p values. We evaluate and compare the performance of these statistical tools in a simulation study, and show their usefulness on two real datasets. Background In the microarray framework researchers are often interested in the comparison of two or more similar experiments that involve different treatments/exposures, tissues, or species. The aim is to find common denominators between these experiments in the form of a parsimonious list of features (for example, genes, biological processes) for which there is strong evidence that the listed features are commonly per- turbed in both (all) the experiments and from which to start further investigations. For example, finding common pertur- bation of a known pathway in several tissues will indicate that this pathway is involved in a systemic response, which is con- served between tissues. Ideally, such a problem should involve the joint re-analysis of the two (all) experiments, but this is not always easily feasible (for example, different platforms), and is, in any case, compu- tationally demanding. Alternatively, a natural approach is to consider the ranked list of features derived in each experi- ment, and to define a process by which a meaningful intersec- tion of the lists can be computed and statistically assessed. Methods to synthesize probability measures from several experiments (for example, p values) have been proposed in the literature. Rhodes et al. in 2002 [1] applied Fisher's inverse chi square test to lists of p values from different exper- iments, with the aim of pooling them together in a meta-anal- ysis. The idea has been improved and enlarged by Hwang et al. [2], who proposed to assign different weights to different experiments and introduced two more statistics in addition to Fisher's weighted F (Mudholkar-George's weighted T and Liptak-Stouffer's weighted Z). However, as these methods look at evidence of global differential expression across the experiments and define sets of genes based on the global p values, their aim is different from ours: we could say that they are focused on statistically assessing the union of different experiments while we are interested in their intersection. The best statistical approach that aims to evaluate the strength of the intersection remains an open question, as dis- cussed recently by Allison et al. [3]. As a first approach, the authors suggest that by using a pre-specified threshold on the p value for differential expression in each experiment, the outcomes of two experiments can be treated as two dichoto- mous variables. A chi-square test of independence can then Published: 11 April 2007 Genome Biology 2007, 8:R54 (doi:10.1186/gb-2007-8-4-r54) Received: 7 July 2006 Revised: 13 November 2006 Accepted: 11 April 2007 The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2007/8/4/R54 R54.2 Genome Biology 2007, Volume 8, Issue 4, Article R54 Blangiardo and Richardson http://genomebiology.com/2007/8/4/R54 Genome Biology 2007, 8:R54 be performed to evaluate whether the degree of overlap between experiments is greater than expected by chance. But this way of proceeding is heavily dependent on the choice of a threshold used to dichotomize the outcome of the two exper- iments and neglects useful information on degrees of evi- dence of differential expression in each experiment. We propose a novel and powerful method for synthesizing such lists that is based on two ideas. Firstly, the departure from the null hypothesis of a chance association between the results of each experiment is characterized by a ratio measur- ing the relative increase of the number of features in common with respect to the number expected by chance. Secondly, the statistical significance of the ratio is assessed and exploited to propose rules to define synthesized lists. For the sake of clarity, from now on we will discuss our meth- odology in the context of gene expression experiments where the features of interest are genes and the aim is to synthesize lists of differentially expressed genes. But we stress that our methodology is applicable to synthesize ranked lists of any feature of interest from a variety of experiments, as long as each feature is associated with a 'measure of interest' on a probability scale. Representing the data in a series of 2 × 2 contingency tables, we first specify a (conditional) model of independence that treats the marginal frequencies in each list as fixed quantities: we calculate the ratio between observed and expected number of genes in common for each table and focus attention on the maximum ratio, that is, the strongest deviation from inde- pendence. We propose a permutation based test to assess its significance and discuss some shortcomings of this simple approach. We enlarge the scenario by specifying a joint model of the two experiments (treating the marginal frequencies of differential expression in each experiment as random quantities, instead of fixed) that is formulated in a Bayesian framework. Infer- ence can be based on the marginal posterior distribution of the maximum of the ratio of the observed to the expected probability of genes to be in common. Note that procedures based on maximum statistics are used in a variety of contexts to focus the analysis on particular sub- sets of interest; for example, in geographical epidemiology as a way of investigating maximum disease risks around a point source [4], or for scanning time or spatial windows for clus- ters of cases [5]. In gene expression studies, maximum-based statistics have been proposed for evaluating if a priori defined gene sets are enriched relative to a list of genes ranked on the basis of their differential expression between two classes [6]. Focusing on the maximal ratio we are not aiming at finding the largest list of genes in common, but we are interested in a parsimonious list associated with the strongest evidence of dependence between experiments. However, by being very specific (few false positives), this procedure tends to be rather conservative and to be associated with a narrow list of genes in common. To increase sensitivity and account for larger lists, we propose a second rule that focuses attention on the list associated with a ratio equal to or greater than two. We show in our simulations that this rule leads to a good compro- mise of false positives and false negatives, indicating very high specificity and good sensitivity. It is also close to achiev- ing the minimum of the total error (sum of false positives and false negatives). We evaluate the performance of our methodology on simu- lated data and compare the results to those obtained using Hwang et al.'s approach. Then, we apply our method to two real case studies, highlighting the biological interest of the obtained results. Results We demonstrate the statistical and biological potential of our methodology using simulated data and publicly available datasets. For the simulation we follow the setup described in [2]. The first real example uses public data from an experi- ment that evaluates the effect of mechanical ventilation on lung gene expression of mice and rats. The second real exam- ple uses public data from an experiment that evaluates the effect of high fat diet on fat and skeletal muscle of mice. 2 × 2 Table: conditional model for two experiments Suppose we want to compare the results of two microarray experiments, each of them reporting for the same set of n genes a measure of differential expression on a probability scale (for example, p value; Table 1). We rank the genes according to the recorded probability measures. For each cut-off q,(0 ≤ q ≤ 1), we obtain the number of differentially expressed genes for each of the two lists as O 1+ (q) and O +1 (q) and the number O 11 (q) of differentially expressed genes in common between the two experiments (Table 2). The threshold q is a continuous variable but, in practice, we consider a discretization of q. In the present paper, we specify a vector q = (q 0 = 0, q 1 = 0.001. , q, , q k = 1), formed by K = 101 elements, but other discretizations can be used without loss of generality. For a threshold q, under the hypothesis of independence of the contrasts investigated by the two experiments, the number of genes in common by chance is calculated as: In the 2 × 2 Table, where the marginal frequencies O 1+ (q), O +1 (q) and the total number of genes n are assumed fixed quantities, given q, the only random variable is O 11 (q). OqOq n 11++ ×() () http://genomebiology.com/2007/8/4/R54 Genome Biology 2007, Volume 8, Issue 4, Article R54 Blangiardo and Richardson R54.3 comment reviews reports refereed researchdeposited research interactions information Genome Biology 2007, 8:R54 The conditional distribution of O 11 (q) is hypergeometric [7]: O 11 (q) ~ Hyper(O 1+ (q), O +1 (q), n). (1) We then calculate the statistic T(q) as the observed to expected ratio: In other words, T(q) quantifies the strength of association between lists at cut-off q in terms of ratio of observed to expected. The denominator is a fixed quantity, so the distri- bution of T(q) is also proportional to a hypergeometric distribution: T q ∝ Hyper(O 1+ (q), O +1 (q), n) with mean and variance: E(T(q)|O 1+ (q), O +1 (q), n) = 1 Throughout, we use the symbol | to denote conditioning, thus E(T(q)|O 1+ (q), O +1 (q), n) indicates the conditional expecta- tion of T(q) given O 1+ (q), O +1 (q) and n. As a first step, we focus attention on the ordinal statistic T(q max ) ≡ max q T(q), which represents the maximal deviation from the null model of independence between the two exper- iments, or equivalently the largest relative increase of the number of genes in common. This maximum value is associ- ated with a threshold q max on the probability measure and with a number O 11 (q max ) of genes in common, which can be selected for further investigations and mined for relevant bio- logical pathways. The exact distribution of T(q max ) is not easily obtained, since the series of 2 × 2 tables are not independent. We thus suggest performing a Monte Carlo permutation test of T(q) under the null hypothesis of independence between the two experi- ments. To be precise, the probability measures of one list are randomly permuted S times, while those of the other list are kept fixed, leading to S values of the statistic T S (q max ), which represent the null distribution of T(q max ). From these, a Monte Carlo p value for the observed value of T(q max ) can be computed and the choice of S adapted to the required degree of precision. 2 × 2 Table: joint model of two experiments For extreme values of the threshold q (q ≅ 0), O 1+ (q) and O +1 (q) can be very small. In this case, the denominator of T(q) assumes values smaller than 1 and T(q) explodes, leading to unreliable estimates of the ratio. In addition, the hypergeo- metric sampling model specified for T(q max ) in our previous procedure does not take into account the uncertainty of the margins of the table (since they are all considered fixed). To address these issues and to improve our statistical proce- dure, we thus propose to consider a joint model of the exper- iments, which also treats O 1+ (q) and O +1 (q) as random variables, releasing the conditioning. Furthermore, we spec- ify this in a Bayesian framework, where the underlying probabilities, for the four cells in the 2 × 2 contingency table (indexes from left to right) are given a prior distribution. In this way, we Table 1 Lists of p values for two experiments Experiment A Experiment B p A 1 p B 1 p A 2 p B 2 p A n p B n Tq Oq OqOq n () () () () .= × ++ 11 11 (2) Var T q O q O q n Oq n nO q n ( ( ) | ( ), ( ), ) () () 11 11 1 1 ++ ++ =− ⎛ ⎝ ⎜ ⎞ ⎠ ⎟ × − − ⎛ ⎝ ⎜ ⎞ ⎠ ⎟⎟ . θθ ii i qi q(), , () ,14 1 1 4 ≤≤ = = ∑ Table 2 Contingency table for experiment A and experiment B, given a threshold q Experiment B DE Non DE Experiment A DE O 11 (q) O 1+ (q) - O 11 (q) O 1+ (q) Non DE O +1 (q) - O 11 (q) n - O 1+ (q) - O +1 (q) + O 11 (q) n - O 1+ (q) O +1 (q) n - O +1 (q) n n is the total number of genes and O 11 (q) is the number of genes in common. DE, differentially expressed. Non DE, non differentially expressed R54.4 Genome Biology 2007, Volume 8, Issue 4, Article R54 Blangiardo and Richardson http://genomebiology.com/2007/8/4/R54 Genome Biology 2007, 8:R54 account for the variability in O 1+ (q) and O +1 (q) and smooth the ratio T(q) for extreme, small values of q. Starting from Table 2, we model the observed frequencies as arising from a multinomial distribution: Since we are in a Bayesian framework, we need to specify a prior distribution for all the parameters. The vector of param- eters θ (q) is modeled as arising from a Dirichlet distribution [8]: θ (q) ~ Dir(a, a, a, a), a = 0.05, which ensures the constraint . The derived quantity of interest is, as before, the ratio of the probability that a differentially expressed gene is truly com- mon for both experiments, to the probability that a gene is included in the common list by chance: The Dirichlet prior is conjugate for the multinomial likeli- hood [8] and the posterior distribution of θ (q)|O, n is again a Dirichlet distribution, given by: This distribution is easily sampled from using standard algo- rithms. Note that the prior weights a = 0.05 can be inter- preted as the number of hypothetical counts in each cell observed prior to the investigation. Further, it can be shown that the variance of the vector of probabilities in the Dirichlet distribution increases as the prior weights tend to zero. Thus, our choice of value of 0.05 for the prior weights allows both high variability and a small influence of the prior specification on the posterior distribution of θ (q). The posterior distribu- tion of R(q)|O, n can be easily derived from that of θ (q) using for example a sample of values of θ (q), generated from the posterior distribution (equation 5). In particular, from a sam- ple of values of R(q)|O, n, the 95% two sided credibility inter- val, CI 95 (q), can be easily computed, for each R(q). 2 × 2 Table: decision rules for intersection In the Bayesian context, several decision rules can be envis- aged to choose the threshold corresponding to the common list showing a clear evidence of association between experi- ments. The general principle is as follows: first, select a ratio R(q) according to a decision rule; second, consider the threshold q corresponding to the selected ratio; and third, return the list O 11 (q), that is, the intersection of the lists for the threshold q. Figure 1 (right) shows a typical plot of R(q) and its credibility interval as a function of q in case of associ- ated experiments (a different shape for R(q) is presented in Additional data file 1). As the p value increases, the ratio R(q) decreases and the associated list of common genes O 11 (q) becomes larger (the number of genes in common for each ratio is indicated on the right axis of the plot). We need a rule to select a threshold on the p value and the corresponding list of genes in common. To this purpose we now discuss two decision rules. Under the null model of no association between the experi- ments, Median(R(q)|H 0 ) = 1, so we consider R(q) as indicat- ing departure from independence if its credibility interval does not contain 1. As an extension of T(q max ) we thus propose to consider the maximum of Median(R(q)|O, n) only for the subset of credi- bility intervals that do not include 1 and define: q max = argmax{Median(R(q)|O, n) over the set of values of q for which CI 95 (q) excludes 1}. (6) In other words, q max is defined to be the threshold associated with the maximum of the ratio, which we denote R(q max ). If all credibility intervals contain 1, the maximum of R(q) can still be computed, but we do not associate it with a list since there is no departure from independence that could be considered significant. Note that in the Bayesian context many R(q) can have a CI that excludes 1 and they all represent a significant deviation from the independence. An advantage of the maximum statis- tic is that it returns a list of interesting features with few false positives (FP), as will be shown later in the simulations. On the other hand, this list is usually rather small and in cases where the level of noise is substantial it excludes a large number of true positives (TP), for which the evidence is less strong. We next consider an alternative to the max ratio: the largest threshold q for which the ratio R(q) ≥ 2. It is the largest threshold where the number of genes called in common at least doubles the number of genes in common under independence: q 2 = max{over the set of values of q for which Median(R(q)|O, n) ≥ 2 and CI 95 (q) excludes 1}. (7) Using this rule provides a fair balance between specificity and sensitivity as we will show later. Indeed, it is expected that when going beyond this point to larger values of q, the mar- ginal benefit of adding a few more true positives and of reduc- ing the false negatives (FN) to the list will be outweighed by the expected larger number of false positives that would also be added. By our simulations we show indeed that this rule is close to giving the minimal global error (FP + FN). Multi n q q q Oq O qOq Oq ( |, ) () () () ( ) [ ( ) ( )] [ ( ) O θαθ θ θ 12 3 11 1 11 1 ×× ++ −−−−−+ × ++ Oq nO qOqOq q 11 1 1 11 4 ()] [ () () ()] () θ (3) θ i i q()= = ∑ 1 1 4 Rq q qq qq () () (() ())(() ()) .= +×+ θ θθ θθ 1 12 13 (4) θ | , ~ ( () ,[ () ()] ,[ () ()] ,[O n DirOq aO q Oq aOq Oq an 11 1 11 1 11 +−+−+− ++ OOqOqOq a 1111++ −+ +() () ()] ) (5) http://genomebiology.com/2007/8/4/R54 Genome Biology 2007, Volume 8, Issue 4, Article R54 Blangiardo and Richardson R54.5 comment reviews reports refereed researchdeposited research interactions information Genome Biology 2007, 8:R54 Figure 2 (top) plots the false discovery rate: FDR = FP(q)/O 11 (q) and false non-discovery rate: FNR = FN(q)/(n - O 11 (q)) for 50 simulations carried out as described in Materials and methods, for scenario I structure A. It is clear that R(q max ) has the smallest FDR. On the other hand, q 2 corresponds to the intersection between FDR and FNR. Moreover, in Figure 2 (bottom) we show that the same threshold minimizes the glo- bal misclassification error as the sum of false positives and false negatives. Note that if we considered the minimum sig- nificant ratio, defined as the minimum of the R(q) over the set of credibility intervals excluding 1, FDR would increase dra- matically and the FNR would decrease only marginally with respect to R(q max ) and R(q 2 ). As expected, the global misclas- sification error would also be much larger, making this rule inappropriate. When there are no ratios R(q) equal or greater than 2 (which can happen in the case of large noise or when there is only a small proportion of genes in common), this rule does not apply and we recommend using the rule corresponding to R(q max ). Our computations have been implemented in the statistical programming language R [9]. The R package for simulating the data, for the two tests and for visualizing the results is called BGcom and is available on our project BGX website [10]. Performance on simulated data Besides assessing the operating characteristics of our pro- posed rules, we also applied the method proposed by Hwang et al. implemented in Matlab [11]. Note that their aim is to Typical plots of T(q) and R(q) for associated experiments (case A1)Figure 1 Typical plots of T(q) and R(q) for associated experiments (case A1). The two associated experiments were simulated under scenario I, structure A, with true differences drawn from a Ga(2.5,0.4) and noise experiment specific of 0.5 and 0.8, respectively (signal-to-noise ratio = 9.6). The left plot shows the distribution of T(q) and the right one shows the distribution of R(q) with Bayesian credibility intervals at 95%. T(q) shows a deviation from 1 for a p value between 0.01 and 0.5. T(q max ) is 2.6 and corresponds to a threshold q = 0.01. R(q) presents the same trend, but the estimates are slightly smaller since the model takes into account the variability of the margins of the 2 × 2 table. The threshold associated with R(q) = 2 is 0.08. The number of genes in common for each ratio R(q) is reported on the right axis of each plot. P value T 0 0.2 0.4 0.6 0.8 1 0 1 T max 3,000 799 688 623 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ _ _ _ _ _ _ _ __ _ _ _ _ _ _ _ _ _ _ __ _ _ _ __ _ _ _ _ _ _ _ _ _ _ _ _ _ __ _ _ ___ _ ____ __ _ _ ________ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ _ _ _ _ _ _ _ _ _ ___ _ _ _ _ _ __ _ _ _ ___ _ _ _ _ _ _ __ __ _ _ _ P value R 0 0.2 0.4 0.6 0.8 1 0 1 R 2 R max 3,000 799 688 623 2 R54.6 Genome Biology 2007, Volume 8, Issue 4, Article R54 Blangiardo and Richardson http://genomebiology.com/2007/8/4/R54 Genome Biology 2007, 8:R54 integrate p values from different experiments in a meta-anal- ysis and they present three statistics to do so: Fisher's weighted F, Mudholkar-George's weighted T and Liptak- Stouffer's weighted Z. We report Fisher's weighted F (the default statistic in the Matlab function), defined as: where w k is the weight for the k th experiment and p gk is the p value for the gene g in the experiment k. F g will be a new glo- bal p value that integrates those weights from different exper- iments. The authors also present several rules to select differentially expressed genes from F g , the simplest one using a fixed threshold on the p values equal to 0.05, and others that minimize the number of false positives and false negatives, in a parametric or non-parametric framework. We follow the authors' suggestion and use the non-parametric rule. For more details on the method, see [2]. The behavior of T(q) and of the credibility intervals CI 95 (q) for a typical simulation are displayed in Figure 1 (associated experiments) and Figure 3 (independent experiments). When the two experiments are not associated (the number of simu- lated genes in common is equal to 0), the plot of T(q) for dif- ferent cut-offs q is, as expected, a horizontal line of height 1, with evidence of noise for small p values. In the same Figure, one sees that all the credibility intervals derived by the Baye- sian procedure include the value 1 and have decreasing width as q gets larger, as expected. In the case of two independent experiments we never declare any gene to be in common in any of the 50 simulations, so our procedure has no error. On the other hand, Hwang et al.'s method picks up 320 genes on average (Table 3, independ- ence case), which are all false positives. When there is a positive association between the two experi- ments, T(q) can assume two shapes: it can decrease monoton- ically as the p values increase (Figure 1), or reach a peak and then decrease (Additional data file 1) as the p values increase. The Bayesian estimates exhibit a similar shape, but since in this approach the variability of the denominator of T(q) is modeled, the resulting ratio estimates are smoothed. We see that our proposed method gives a sensible and inter- pretable procedure, with a pattern that is easily distinguisha- ble from that of the no association case. This is confirmed by the results given in Table 4. Scenario I mimics a realistic situation where the two experi- ments have different degrees of differential expression and consequently quite different list sizes at any given signifi- cance level. It supposes that the list of genes is divided into four groups: genes differentially expressed in both experi- ments, genes differentially expressed in only one of the two experiments, and genes differentially expressed in neither experiment. The first group identifies the 'true positive genes' that we want to detect by our method. The remaining groups act like additional noise to make the set up more realistic. We also define a different scenario (scenario II) to mimic a situa- tion where the two experiments have similar size of differen- tial expression. It only supposes the genes are divided into two groups: differentially expressed genes in both experi- ments and differentially expressed genes in no experiment. We describe the simulation set up in detail in Materials and methods. Misclassification error, false discovery and false non-discovery rates for case A2 (results are averaged over 50 replicates)Figure 2 Misclassification error, false discovery and false non-discovery rates for case A2 (results are averaged over 50 replicates). The upper plot shows the false discovery rate (FDR) and the false non-discovery rate (FNR) for case A2. The FDR is calculated as the ratio of the false positives to the number of genes called in common, while the FDR is calculated as the ratio of the false negatives to the number of genes not called in common. The true differences d g are drawn from a Ga(2, 0.5) and the noise component experiment specific is 2 for the first experiment and 3 for the second. R(q max ) shows the minimum FDR. On the other hand, R(q min ) has a very large FDR and the improvement of the FNR is slight. As a compromise, the threshold q 2 is close to q max , so guarantees a low FDR, but returns a larger list. It approximatively corresponds to the intersection point between the two curves of FDR and FNR. The lower plot shows the global error as the sum of FP and FN. The threshold associated with R(q 2 ) is very close to the minimum of the curve, that is, to the smallest global misclassification error. 0.0 0.2 0.4 0.6 P value q max q 2 0.2 0.3 0.4 q min 0.6 0.7 0.8 0.9 1 FDR FNR 500 1,000 1,500 2,000 P value q max q 2 0.2 0.3 0.4 q min 0.6 0.7 0.8 0.9 1 FP + FN Fwp gkgk k =− = ∑ 2 1 2 ln() http://genomebiology.com/2007/8/4/R54 Genome Biology 2007, Volume 8, Issue 4, Article R54 Blangiardo and Richardson R54.7 comment reviews reports refereed researchdeposited research interactions information Genome Biology 2007, 8:R54 In both scenarios, structure A refers to experiments where there would be a large proportion of genes in common relative to the total number of differentially expressed genes. Case A1 is characterized by a large true difference between conditions and a small experiment-specific error, giving an average sig- nal-to-noise ratio of 9.6. Our first rule returns a ratio T(q max ) = 2.61 that is associated with q max = 0.01. In this case the aver- age number of genes in the common list associated with the max ratio is O 11 (q max ) = 619, while that expected is and the permutation based test returns a significant Monte Carlo p value ≤ 0.001. The Bayesian ratio R(q max ) is slightly smaller than T(q max ); accounting for varia- bility in the Bayesian model results in wide CIs for small p val- ues as previously pointed out. Our methodology gives excellent results in this case, with the sum of false positives and false negatives equal to 89, while the FDR is 0.006 and the FNR is 0.036. Moving from q max to q 2 , the number of genes called in common by this procedure is 676, which is very close to the true number of common genes set in the simulation (700). The number of false positives is larger than the one corresponding to q max , but still quite small, whilst the number of false negatives decreases appreciably, so that the global error reaches its minimum value (83). Note that both q max and q 2 generate a far smaller global error than Hwang et al.'s procedure (Table 3). Moving to case A2, the noise associated with the experiment increases and the true differences between conditions are smaller. This results in fewer genes called in common and a corresponding increase in the global error. Nevertheless, all the cases present the same trend: q max is associated with the synthesized list having the smallest number of false positives and the list given by q 2 is close to the one with the smallest global error. Moreover, for both cut-offs our methodology consistently leads to smaller errors than that of Hwang. Typical plots of T(q) and R(q) in the case of independent experimentsFigure 3 Typical plots of T(q) and R(q) in the case of independent experiments. The two independent experiments are simulated under scenario I, structure A, with true differences drawn from a Ga(1, 1) and noise experiment specific of 2 and 2.5, respectively (signal-to-noise ratio = 0.4). The left plot shows the distribution of T(q) and the right one shows the distribution of R(q) with Bayesian credibility intervals at 95%. T(q) follows a horizontal line of height 1 (independence between the lists) and presents instability for small p values (left tail). The Bayesian model does not present any significant threshold for which R(q) deviates from 1 and the CI 95 always includes 1. P value T 0 0.2 0.4 0.6 0.8 1 0 _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ _ _ _ _ __ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ __ _ _ _ _ _ _ _ _ _ _ _ __ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ P value R 0 0 0.2 0.4 0.6 0.8 1 11 975 730 3000 237 × = R54.8 Genome Biology 2007, Volume 8, Issue 4, Article R54 Blangiardo and Richardson http://genomebiology.com/2007/8/4/R54 Genome Biology 2007, 8:R54 Simulations under structure B and C mimic cases where there is a smaller proportion of genes in common relative to the total number of differentially expressed genes. For cases B1 and C1 the noise is very small and the true difference between conditions is large; cases B2 and C2 are characterized by a smaller true difference and a higher noise. The pattern remains the same in cases A1 and A2: the list associated with q max shows the smallest number of false positives, while the one associated with q 2 is very close to the minimum global error. Again our rules show a far smaller global error that those of Hwang. Note that for cases B1 and C1, there is no q 2 and q max is associated with the smallest global error. Addi- tional simulations are presented in Tables 1 and 2 of Addi- tional data file 1. Scenario II shows a similar trend confirming that our method also works well in a different experimental framework. We still find very few false positives with both rules q max and q 2 . On the other hand, the sensitivity is generally higher than in scenario I for both rules, hence the global error is smaller. This results in a better performance of the maximum q max : it shows no false positive in all the cases of this scenario and since the false negatives are generally fewer, its global error is quite small and, in some cases, smaller than the one for q 2 . Hwang et al.'s method shows an improvement in terms of false positives with respect to scenario I, while the false nega- tives remain quite the same. This is to be expected because, in this scenario, the intersection and the union of differentially expressed genes are identical. Nevertheless, our method also performs better in most of the cases in this scenario, with the exception of case A2, where our global error is 509 for the q 2 rule while Hwang et al.'s is 450. However, we still halve the number of false positives. See Tables 3 and 4 of Additional data file 1 for the results under scenario II. Common features related to ventilation-induced lung injury We applied our methods to lists of p values for 2,769 mouse and rat orthologs deriving from a study investigating the del- eterious effects of mechanical ventilation on lung gene expression through a model of mechanical ventilation- induced lung injury (VILI; see Materials and methods for details of this study). Results from the joint model are sum- marized in Table 5 and the plots are presented in Figure 2 of Additional data file 1. The conditional model returns nearly identical results. Due to the large variability there is no threshold associated with a R(q) ≥ 2, so we present the results related to q max . The number of differentially expressed genes common to both species is estimated as 97, which corre- sponds to 63 orthologs (note that each probeset of one species can be associated with several probesets of the other). These are presented in Additional data file 1, which shows the Table 3 Performance of Hwang et al.'s method on simulated data for scenario I DE nonD E FP (%) TP (%) FN (%) TN (%) Global error Global error R(q 2 ) Independent case: n = 3000, common = 0, DE1 = 1000, DE2 = 800 320 2,680 320 (10.7) 0 0 2,680 (89.3) 320 0 A: n = 3000, common = 700, DE1 = 1000, DE2 = 800 Case A1 1,121 1,879 440 (19.1) 681 (97.3) 19 (2.7) 1,860 (80.9) 459 82 Case A2 409 2,591 188 (8.2) 221 (31.6) 479 (68.4) 2,112 (91.8) 667 544 B: n = 3000, common = 200, DE1 = 700, DE2 = 500 Case B1 999 2,001 805 (28.8) 194 (97.0) 6 (3.0) 1,996 (71.2) 811 31* Case B2 427 2,573 333 (11.9) 94 (47.0) 106 (53.0) 2,467 (88.1) 439 165 C: n = 3000, common = 100, DE1 = 500, DE2 = 400 Case C1 816 2,185 718 (24.8) 97 (97.1) 3 (2.9) 2,182 (75.2) 721 19* Case C2 346 2,654 299 (10.3) 47 (47.0) 53 (53.0) 2,601 (89.7) 352 84 Average simulation results: we present the results from Hwang et al.'s method on the simulated data under scenario I. DE1 and DE2 are the differentially expressed genes in the first and the second experiment respectively. We used the Fisher's weighted F defined as , where w k is the weight for the k th experiment and p gk is the p value for the gene g in the experiment k. We present the non-parametric rule to select the differentially expressed (DE) genes, as suggested by the authors. The method is implemented in Matlab. In the last column we report the Global error (FP + FN) of our procedure for q 2 (see Table 2) for ease of comparison. *There is no ratio larger than 2 so the maximum rule has been used in this case. Fwp gkgk k =− = ∑ 2 1 2 ln() http://genomebiology.com/2007/8/4/R54 Genome Biology 2007, Volume 8, Issue 4, Article R54 Blangiardo and Richardson R54.9 comment reviews reports refereed researchdeposited research interactions information Genome Biology 2007, 8:R54 number of ortholog pairs in common out of the number of ortholog pairs measured. We compared our results to those obtained applying Hwang et al.'s method, also presented in Table 5. The latter picked 1,425 globally differentially expressed genes using the non- parametric rule. The 97 genes in common found by our method are included in their list, which is not surprising since ours focuses on the intersection of the two lists of p values, while theirs tests their union. Table 4 Performance on simulated data for scenario I Parameters Rules q R CI 95 O 11 O 1+ O +1 FP (%) TP (%) FN (%) TN (%) Global error Independence case: n = 3000, common = 0, DE1 = 1000, DE2 = 800 Independence: signal to noise 0.55 1* 0.98-1.02 0 † 0 † 0 † 0 0 0 3,000 (100.0) 0 ratio = 0.4 ‡ A: n = 3000, common = 700, DE1 = 1000, DE2 = 800 Case A1: signal to noise ratio = 9.6 ‡ Max 0.01 2.60 2.50-2.72 619 975 730 4 (0.2) 615 (87.8) 85 (12.2) 2,296 (99.8) 89 Double 0.06 2.04 1.97-2.19 676 1,095 877 29 (1.3) 647 (92.4) 53 (7.6) 2,271 (98.7) 82 Min § = 81 Case A2: signal to noise ratio = 1.6 ‡ Max 0.01 4.72 4.19-5.29 86 346 157 1 (0.0) 85 (12.1) 615 (87.9) 2,299 (100.0) 616 Double 0.08 2.01 1.90-2.20 212 677 459 28 (1.2) 184 (26.3) 516 (73.7) 2,272 (98.8) 544 Min § = 535 B: n = 3000, common = 200, DE1 = 700, DE2 = 500 Case B1: signal to noise ratio = 9.6 ‡ Max ¶ 0.01 1.72 1.58-1.86 185 691 467 8 (0.3) 177 (88.5) 23 (11.5) 2,792 (99.7) 31 Min § = 31 Case B2: signal to noise ratio = 1.6 ‡ Max 0.01 2.98 2.38-3.71 36 250 145 3 (0.1) 33 (16.7) 167 (83.3) 2,797 (99.9) 170 Double 0.03 2.03 1.67-2.40 57 355 236 11 (0.4) 46 (23.0) 154 (77.1) 2,789 (99.6) 165 Min § = 165 C: n = 3000, common = 100, DE1 = 500, DE2 = 400 Case C1: signal to noise ratio = 9.6 ‡ Max ¶ 0.01 1.48 1.30-1.67 95 500 383 7 (0.2) 88 (88.4) 12 (11.6) 2,893 (99.8) 19 Min § = 19 Case C2: signal to noise ratio = 1.6 ‡ Max 0.01 2.93 2.16-3.83 20 214 96 3 (0.1) 17 (16.6) 83 (83.4) 2,897 (99.9) 86 Double 0.02 2.16 1.63-2.81 26 262 134 5 (0.2) 21 (21.0) 79 (79.0) 2,895 (99.8) 84 Min § = 84 Average simulation results: we show the results from the joint model on one case of simulated data for independent experiments and six cases of simulated data for two associated experiments. The simulation scenario consists of four groups of genes: differentially expressed DE in both experiments, differentially expressed in only one experiment (DE1 and DE2 respectively), and differentially expressed in neither experiment. For the Independence case, the number of genes differentially expressed in both experiments was set to 0. We present two decision rules: the threshold associated with the maximum R(q) is q max and the threshold associated with the R(q) ≥ 2 is q 2 (called 'double' in the table). We define q max = arg max{Median(R(q) | O, n) over the set of values of q for which CI 95 (q) excludes 1} and q 2 = max{over the set of values of q for which CI 95 (q) excludes 1 and Median(R(q) | O, n) ≥ 2}. We averaged the results over 50 repeats for each case. *In case of independence it is still possible to calculate he maximum of R(q), but it is not significant, so there is no associated list of common genes. † All the CIs contain 1, so no genes are called in common; thus, there are no FP. ‡ The signal to ratio is calculated as E(Ga(shape, 1/scale))/(r 1 /2 + r 2 /2). § Minimum global error (observed). ¶ There is no ratio larger than 2 and only the maximum rule has been reported. Table 5 Results from the VILI experiment Joint Bayesian model Hwang et al.'s method q max R(q max ) O 11 O 1+ O +1 CI 95 DE nonDE 0.01 1.43 97 393 886 1.13-1.75 1,425 3,734 The number of genes in common is 97, which corresponds to 63 orthologs. The conditional model shows the same results (not reported). The procedure indicates clearly a significant association between the two lists. Hwang et al.'s method calls 1,425 genes as differentially expressed (DE). All the genes reported by our method are included in their list. R54.10 Genome Biology 2007, Volume 8, Issue 4, Article R54 Blangiardo and Richardson http://genomebiology.com/2007/8/4/R54 Genome Biology 2007, 8:R54 This difference is highlighted in Figure 4 (left), which plots mice fold change versus rats fold change on the natural loga- rithmic scale: it is apparent that genes highlighted by Hwang et al.'s method but not by ours (+) have log fold change close to 0 for one of the species, while the genes highlighted by both the methodologies (o) present large fold changes for both the species. The correlation between the fold changes measured in the two experiments is 0.4 for the 97 orthologs returned by our procedure and 0.06 for the other 1,328 genes picked up only by Hwang et al.'s method, confirming how our method- ology focuses attention on the genes differentially expressed in both experiments. We used fatiGO [12] to annotate the common set of orthologs found by our analysis: 24 genes are involved in one or more pathways described in the Kyoto Encyclopedia of Genes and Genomes (KEGG), 42 are annotated at the third level of the Gene Ontology (GO) as part of biological processes, 41 belong to molecular functions and 36 to cellular components. See Additional data file 2 for the complete list of GO categories and KEGG pathways. Out of the biological processes, the most represented are related to the integrated function of a cell ('cellular physiological process', 'metabolism', 'regulation of cellular process', 'regulation of physiological process'), showing between 38 and 15 orthologs in common. In addition, there are some other interesting processes related to responses of the body to stress and external or endogenous stimulus; these can be related to the effect of mechanical ventilation, which acts as an external stimulus and also causes stress on cells. From the KEGG pathways, we focus attention on the two most represented categories: the 'MAPK signaling activity' and the 'Cytokine-cytokine receptor interaction'. Six of the orthologs found to be significant are involved in the first (Fgfr1, Gadd45a, Hspa8, Hspa1a, Il1b, Il1r 2 ). The involve- ment of this pathway is again suggestive of how mechanical Log fold change (natural log) for the VILI experiment (left) and high-fat diet experiment (right)Figure 4 Log fold change (natural log) for the VILI experiment (left) and high-fat diet experiment (right). The left plot shows the log fold changes for mice versus rat averaged over the two replicates for each species. The right plot shows the log fold changes for fat versus muscle averaged over the three and four replicates for each species. The circles correspond to the genes highlighted by our analysis and by the method of Hwang et al.; they are characterized by a large log fold change for both the species. The correlation of the two fold changes for this group is 0.4 (VILI experiment) and 0.8 (high-fat diet experiment). The crosses correspond to the genes highlighted only by Hwang et al.'s analysis; they are characterized by a large log fold change for one species and a small fold change for the other one. The correlation of the two fold changes for this group is 0.06 (VILI experiment) and 0.36 (high-fat diet experiment). −2 −1 0 1 2 −2 −1 0 Mice log fold change Rats log fold change + + ++ + + + + + + +++ + ++++++ + +++ ++ ++ + + ++ +++ + + + + + + ++ ++ + ++ ++ + + + + + + + + + +++ + + + + + + + + + + ++ + ++ + + ++ + + + + + + + ++ ++ + ++ ++ +++ + + + + + + + + ++ + + +++ ++ + + + + ++ ++ + ++ + + ++ + + +++ + + + + +++ + + + + + ++ + ++ ++ ++ + + + + + + +++++ ++ + ++++ + + +++ +++ + ++ +++++ + + +++ + + + ++ ++ + ++ + + + + + + + ++ ++ + ++ + + + + + + + + + + + ++ + + + + + + + + + + + ++ + + ++ ++ + + + + + +++ + +++ + ++ + + + + + + + ++ + + + ++ + +++++ + + + + ++ + ++ + + + + ++ ++ + + + + + + + ++ ++ ++ + + + + + + + + + + ++ + + + +++ + + + + +++ + + + + + + + ++ + + + + ++ ++ + + +++ + + +++ + + + +++ ++ ++ ++ ++ ++ + + + + ++ + + ++ ++ + + ++ + ++ + + + + ++ + + + + + + + ++ + ++ + + + + + + ++ ++ + + + + + +++ + + + + + + + + + ++ ++ + + ++ + + + ++ ++ + + + +++ + + + + ++ ++ + + ++ + ++ + + ++ + ++ + + + + + + +++ + +++ ++ + ++ ++ + + +++ + + + ++ + + + + +++ + + + ++ + +++ + + + ++ + +++ + + + + + + + + + + + ++ ++ + +++++ + + + + + + + + + + + + + + + + + + + + + +++ + + + ++ ++ + + + + + + + ++ + + + + + + + + + + + + + + + + + ++ + +++ + + ++ + + + + + ++ + + + + + + + + + + +++ ++ + + + + + + + + + + + + ++ + ++ ++ + + ++ + + + ++ + ++ + + + + + + ++ + + + ++ ++ + ++ + + +++ + + + + +++ + ++ + + + + + + ++ ++ ++ + ++ + + ++ + + + + ++ + + + ++ + + + + + ++ + + ++ + ++ + ++ + + ++ + ++ + + + + ++ + + + + + + ++ + + ++ ++ ++ + + ++ + +++ + +++ + + + + + ++ + ++ + +++ + + + + + ++ + + + + + + + ++ ++ + + + + + + + ++ + + + + ++ + + + + + + + + ++ + + + ++ + + + + ++ + + + + + + + + ++ ++ + ++ + + + ++ + + ++ + + ++ ++ + + ++ ++ + ++ + + ++ ++ + ++ ++ + + ++ + + + + +++ ++ ++ + + ++ + + + + + ++ +++ + +++ + + + + + ++ + + ++ + ++ + + + ++ + ++ ++ ++ + + + + ++ + + + +++ + + + + + + + + + + + + + + ++ + ++ + +++ + ++ ++ + + + + ++ ++ ++ ++ + + ++ + + + + + ++ ++ + +++ ++ +++ + +++ ++ + + + +++ ++ + + + + ++ + + + + + +++ + + + + + + + + + + ++ ++ + + + + ++ + + + + + + ++ ++ + + + + ++ ++ + + ++ ++ + ++ + + + + ++ + + ++ + ++ + + ++ + ++ + + + + + + + + + ++ + + + ++ + + ++ + + + ++ + + ++ + + +++ + + ++ + + + + + + + + ++ + + + + + + + + ++ + ++ ++ + + + ++ + + + + + + + + + + + + ++ + + + + + + + + + + + + ++ + + + + + + + ++ + + + + + + + ++++ + ++ + + + + + + + + + + + + + ++ ++ + + ++ + +++ + ++ −4 −3 −2 −1 0 1 2 3 Fat log fold change Muscle log fold change + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + 1 2 3 −2 −1 0 1 2 [...]... chronic inflammation in adipose tissue [13] Indeed, the top KEGG category in our analysis is 'Cytokine-cytokine receptor interaction', and includes the genes Ccl2 and Tnfrsf1b The Ccl2 gene encodes a C-X-C family cytokine that is a ligand for the receptor Ccr2, a key mediator of dietinduced obesity and insulin resistance [14] Tnfrsf1b encodes a receptor for tumor necrosis factor, an inflammatory cytokine... to be related to each function instead of each gene, using for instance the methodology for global testing of biological functions described in Goeman et al [24] refereed research Intersecting lists of differentially expressed features is a natural way to synthesize experiments, but calls for a statistical procedure to choose the cut-off on the ranked lists that is best for balancing specificity and... Bohlooly -Y M: Growth hormone receptor deficiency results in blunted ghrelin feeding response, obesity, and hypolipidemia in mice Am J Physiol Endocrinol Metab 2006, 290:E317-325 Liu Y, Nakagawa Y, Wang Y, Sakurai R, Tripathi PV, Lutfy K, Friedman TC: Increased glucocorticoid receptor and 11beta-hydroxysteroid dehydrogenase type 1 expression in hepatocytes may contribute to the phenotype of type 2 diabetes... clearly show that our procedure gives a coherent list of genes that are differentially expressed in both species and is consequently a powerful procedure for finding common pathways of interest The size of the differentially expressed genes for each cut-off q is quite different for the two tissues This is an example of the simulated scenario I, where we clearly also expect to have genes differentially expressed. .. structure C, we set 100 genes in common, out of 500 differentially expressed in the first experiment, 400 differentially expressed in the second experiment and 300 differentially expressed in the third experiment We simulated four replicates for each condition in each experiment and used Cyber-T [27] to analyze the two experiments separately and to obtain the lists of p values Cyber-T is a statistical program... CI95 of 1.90-2.21 In this case the number of common genes is 226 interactions Common features related to high-fat diet refereed research Five of the orthologs found to be significant by our methodology belong to the 'Cytokine-cytokine receptor interaction' category (IL6, Il1b, Il1r2, Ccl2, Kit) This again suggests an involvement of immune response in VILI for both species As was already ascertained... genes differentially expressed only in the second experiment and genes differentially expressed in neither experiment In the second scenario, called scenario II, we divided the genes into only two groups: genes differentially expressed in both experiments and genes differentially expressed in neither experiment Scenario II is thus a particular case of scenario I, which assumes strong communality between... be an inducer of insulin resistance in adipose tissue [15,16] It is particularly interesting, therefore, to see that inflammatory genes are also perturbed in muscle by the switch to a high-fat diet, suggesting that similar molecular events are brought about in these two tissues in response to the change in diet Another interesting category at the top of the list is 'Neuroactive ligand receptor interaction',... the suppression of the immune system Activation of expression of NR3C1 within the liver may contribute to the development of type 2 diabetes in mice [18] and it has a role in liver glucose metabolism during fasting and in diabetic mice [19] It would be very interesting to further investigate if its role is maintained in other tissues besides fat and muscle, as suggested by our analysis The proposed... Shimomura I: Increased oxidative stress in obesity and its impact on metabolic syndrome J Clin Invest 2004, 114:1752-1761 Petersen KF, Dufour S, Shulman GI: Decreased insulin-stimulated ATP synthesis and phosphate transport in muscle of insulinresistant offspring of type 2 diabetic parents PLoS Med 2005, 2:879-884 Roy S, Mitra S: An introduction to some nonparametric generalizations of analysis of variance . R54 Method Statistical tools for synthesizing lists of differentially expressed features in related experiments Marta Blangiardo and Sylvia Richardson Address: Centre for Biostatistics, Imperial College, St Mary's. consists of four groups of genes: differentially expressed DE in both experiments, differentially expressed in only one experiment (DE1 and DE2 respectively), and differentially expressed in neither. aim is to synthesize lists of differentially expressed genes. But we stress that our methodology is applicable to synthesize ranked lists of any feature of interest from a variety of experiments,

Ngày đăng: 14/08/2014, 20:22

Từ khóa liên quan

Mục lục

  • Abstract

  • Background

  • Results

    • 2 ° 2 Table: conditional model for two experiments

    • 2 ° 2 Table: joint model of two experiments

    • 2 ° 2 Table: decision rules for intersection

    • Performance on simulated data

    • Common features related to ventilation-induced lung injury

    • Common features related to high-fat diet

    • Modeling three way contingency tables

    • Discussion

    • Conclusion

    • Materials and methods

      • Simulated data

      • Simulated data for three lists

      • Simulated data for a correlated set of genes

      • Publicly available dataset: synthesizing VILI between two species

      • Publicly available dataset: effect of high-fat diet versus normal fat diet in mice fat and muscle

      • Additional data files

      • Acknowledgements

      • References

Tài liệu cùng người dùng

Tài liệu liên quan