MVRM A Hybrid Approach to Predict siRNA Efficacy

2015 Seventh International Conference on Knowledge and Systems Engineering MVRM: A hybrid approach to predict siRNA efficacy Bui Ngoc Thang Le Sy Vinh Ho Tu Bao University of Engineering and Technology, University of Engineering and Technology, School of Knowledge Science Vietnam National University, Hanoi Vietnam National University, Hanoi Japan Advanced Institute of Science 144 Xuanthuy, Caugiay, Hanoi, Vietnam 144 Xuanthuy, Caugiay, Hanoi, Vietnam and Technology Email: thangbn@vnu.edu.vn Email: vinhls@vnu.edu.vn Email: bao@jaist.ac.jp Abstract—The discovery of RNA interference (RNAi) leads to design novel drugs for different diseases Selecting short interfering RNAs (siRNAs) that can knockdown target genes efficiently is one of the key tasks in studying RNAi A number of predictive models have been proposed to predict knockdown efficacy of siRNAs, however, their performance is still far from the expectation This work aims to develop a predictive model to enhance siRNA knockdown efficacy prediction The key idea is to combine both the rule–based and the model–based approaches To this end, views of siRNAs that integrate available siRNA design rules are first learned using an adaptive Fuzzy C Means (FCM) algorithm The learned views and other properties of siRNAs are combined to final representations of siRNAs The elastic net regression method is employed to learn a predictive model from these final representations Experiments on benchmark datasets showed that the proposed method achieved stable and accurate results in comparison with other methods I The rule–based approach proposes different rules to generate effective siRNAs These rules were empirically designed and examined based on small datasets The first rational siRNA design rule was detected by Elbashir et al [6] They suggested that siRNAs of size 19–21 nt with nt overhangs at the ends can efficiently degrade target genes mRNAs Scherer et al [22] found that the thermodynamic properties are important characteristics to design effective siRNAs for inhibiting target specific mRNAs Soon after that, various rational design rules to generate effective siRNAs have been proposed [21], [30], [1], [15], [37], [38] For example, Uitei and his colleagues [30] examined 72 siRNAs targeting six genes and discovered four criteria for effective siRNA design: (i) A or U at position 19, (ii) G or C effective at position 1, (iii) at least five U or A residues from positions 13–19, (iv) no GC stretch more than nt Amarzguioui and co– workers [1] analyzed 46 siRNAs targeting genes and reported the following rule of six criteria for effective siRNA design: (i) ΔT3 = T3 − T5 , the difference between the number of A/U residues in three terminal positions at the end and at the end (relative to the sense strand of the siRNA) ΔT3 > is positively correlated; (ii) G or C residue at position 1, positively correlated; (iii) an U residue at position 1, negatively correlated; (iv) an A residue at position 6, positively correlated; (v) A or U at position 19, positively correlated; (vi) G at position 19, negatively correlated I NTRODUCTION RNA interference (RNAi) is a cellular process in which long double stranded RNA duplex or hairpin precursors are cleaved into short interfering RNAs (siRNAs) by the ribonuclease III enzyme Dicer siRNAs bind the RNA induced silencing complex (RISC), then unwinded into sense and antisense strands, after that antisense siRNAs bind to their complementary target mRNAs and induce their degradation In 1998, Fire and Mello discovered the important role of dsRNAs when they studied RNAi in the nematode worm Caenorhabditis elegans (they were awarded the Nobel Prize in Physiology or Medicine for their contributions to research on RNAi in 2006) Studies on the discovery of RNAi have had an immense impact on biomedical research and make RNAi as a valuable tool to design novel medical applications [27], [7], [13], [25], [17], [10] In RNAi research, synthesizing of highly effective siRNAs is a crucial task to design novel drugs for the treatment of different diseases such as influenza A virus, HIV, hepatitis B virus, RSV viruses, cancer disease and so on As a consequence, siRNA–based silencing is considered as one of the most promising techniques in future therapy and predicting knockdown efficacy of siRNAs is an essential problem for effective siRNA selection [39], [40], [28], [31], [32], [33], [34], [35] However, the rule–based approach does not reach our satisfaction About 65% of siRNAs generated by these rules have failed when experimentally tested In particular, they were 90% in inhibition and nearly 20% of them were inactive [20] The main reason is that siRNA design rules were empirically analyzed on small datasets and siRNAs were synthesized from specific genes Therefore, they are in general poor to individually design highly effective siRNAs The model–based approach includes predictive models that were learned from larger datasets by different machine learning techniques The performance of predictive models is more accurate and reliable than that of the rule–based approach [24] For example, Huesken and co–workers [12] proposed a new algorithm, Biopredsi, by applying artificial neural networks to a dataset of 2431 scored siRNAs This dataset was widely used as a benchmark to train and test other predictive models such as the ThermoComposition21 [24], DSIR [28], i–Score [14] and Scales models [36] The predictive models are currently estimated as the best predictors [18], [36] More recently, Sciabola A number of algorithms have been proposed to design and predict effective siRNAs They could be categorized into two approaches: the rule–based approach and the model–based approach [14], [18], [23] 978-1-4673-8013-3/15 $31.00 © 2015 IEEE DOI 10.1109/KSE.2015.29 120 et al [23] employed three–dimension structural information of siRNAs to increase performance of their model A stable predictive model [3] called BiLTR was developed to predict knockdown efficacy of siRNAs TABLE I SEQUENCES Properties GC content T Although model–based methods are better than rule-based methods, they suffer from some drawbacks Their performance is still slow and unstable The predictive ability of these models is considerably decreased and changed when tested on independent datasets such as the performance of 18 current models tested on three independent datasets [23] Our analyses reveal two main reasons of the models: (1) siRNAs datasets were provided by different groups under different protocols in different scenarios [16], [41] so the distributions of these datasets are very different and siRNAs data are heterogeneous (2) The performance of machine learning methods also heavily depends on the choice of data representation (or features) on which they are applied In the previous models, siRNAs were encoded by binary, spectral, tetrahedron, and sequence representations However, because of siRNA distribution diversity and unsuitable measures based on these siRNA representations, they can be inappropriate to represent siRNAs in order to build a good model for predicting siRNA efficacy GC stretch A/Us at five positions of the 5‘end A/Us at seven positions of the 5‘end Condition From 0.3 to 0.6 Otherwise >= Otherwise >= Otherwise Encoding column (1,0,-1,-1) at column (0,1,-1,-1) at column (1,0,-1,-1) at column (0,1,-1,-1) at column (1,0,-1,-1) at column (0,1,-1,-1) at column >= Otherwise (1,0,-1,-1) at column (n + 4) (0,1,-1,-1) at column (n + 4) >= Otherwise (1,0,-1,-1) at column (n + 5) (0,1,-1,-1) at column (n + 5) (n + 1) (n + 1) (n + 2) (n + 2) (n + 3) (n + 3) Encoding siRNAs by content: Each siRNA is a sequence of n nucleotides such as “GAAAGGAAUUGUAUAAAUC” There are five well-studied characteristics of an siRNA [26]: (1) GC content, (2) the difference of A/U in nucleotides at the two ends ( T), (3) GC stretch, (4), (5) the number of A/U at five and seven positions of the 5’ end of the antisense strand This step encodes siRNA sequence si (i = m) by a binary matrix Mi of size × (n + 5) in which rows represent for nucleotide types and (n + 5) columns represent for n nucleotides and siRNA characteristics The first n columns represent for n nucleotides, i.e column c (c = n) is binary vector of size × representing the nucleotide at position c on the siRNA sequence Specifically, four nucleotides A, C, G, and U are encoded by encoding vectors (1, 0, 0, 0)T , (0, 1, 0, 0)T , (0, 0, 1, 0)T and (0, 0, 0, 1)T , respectively The last five columns of the matrix represent for five characteristics of siRNA They are computed and encoded as binary vectors as described in Table I The encoding matrix M of an siRNA sequence of 19 nucleotides GAAAGGAAUUGUAUAAAUC” is described in Table II In this paper, we develop a hybrid approach, named MVRM, to predict the siRNA knockdown efficacy The method combines both design rules and machine learning methods to build a predictive model To this end, we focus on the representation of siNRAs Available siRNA design rules are considered as prior background knowledge for generating views to represent siRNAs Each view captures characteristics of a siRNA design rule These views are then learned by exploiting the fuzzy C means algorithm A new representation of siRNAs is composed by learned views and other properties of siRNAs such as melting temperature, molecular weight and thermodynamic values After transforming siRNAs to the new representation, a predictive model was learned by applying a regularized method, Elastic Net, to predict knockdown efficacy of siRNAs Encoding design rules to views: This step encode each design rule ri (i = k) by a matrix Ti (view Ti ) of size × (n + 5) in which rows represent for nucleotides types and (n + 5) columns represents for n nucleotides and siRNA characteristics Column j th (j = n) of the matrix shows the knockdown efficacy of nucleotides A, C, G, U The last five columns describe the knockdown efficacy of the five siRNA characteristics Our method is experimentally compared with other methods on benchmark datasets Experiments show promising results that the performance of the MVRM is comparable or better than that of other methods II T HE FIVE WELL - STUDIED CHARACTERISTICS OF SI RNA The knockdown efficacy of view Ti has to satisfy constrains of the siRNA design rule The design rule ri propositionally describes the occurrence or absence of nucleotides at different positions on effective siRNAs and other mentioned siRNA characteristics Thus, if design rule ri states the occurrence (or absence) of some nucleotides on the j th position, then their corresponding values in the view Ti would be greater (or smaller) than other values at column j Similarly, if the siRNA design rule ri shows the characteristics j th , the corresponding value at column (n + j)th of matrix Ti would be greater than the other values in the column M ETHODS Our model, MVRM, is a hybrid of the rule–based and the model–based approaches so it consists of two main phases: Learning siRNA views from design rules to build new representations of siRNAs and building a predictive model from these new representations to predict knockdown efficacy of siRNAs A Learning siRNA views Given a dataset of m siRNA sequences S = {s1 , s2 , , sm } with the same length n The knownkdown efficacy of sequence si ∈ S is ei (i = m) A set of k design rules R = {r1 , r2 , , rk } are collected from previous rule-based studies The learning siRNA views includes four steps: Encoding siRNAs by content, Encoding rules to views, Learning siRNA views, and Encoding siRNAs by learned views For example, consider a rule r and its encoding matrix T, the design rule shows that at position 19, nucleotides A is effective and nucleotide C is ineffective It means that the knockdown efficacy of nucleotide A is larger than that of the other nucleotides and the knockdown efficacy of nucleotide C is smaller than that of the other nucleotides T[1,19], T[2,19], T[3,19], and T[4,19] are the knockdown efficacy of A, C, G, 121 Where Tj [., c] is a vector corresponding to the cth column of the matrix Tj TABLE II T HE ENCODING MATRIX M OF SI RNA SEQUENCE GAAAGGAAUUGUAUAAAUC T HE FIRST 19 COLUMNS ENCODE FOR 19 NUCLEOTIDES OF THE SI RNA SEQUENCE T HE LAST COLUMNS ENCODE FOR CHARACTERISTICS OF THE SI RNA SEQUENCE Posision siRNA GAAAG 18 U 19 C 20 21 22 23 24 Encoding Matrix T 0 0 0 1 0 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 0 0 0 0 Algorithm describes two steps including the computing membership values of encoding matrices and updating matrices corresponding to views These two steps are repeated until membership values and views meet convergence criteria Encoding siRNAs by views: To obtain a final representation of siRNAs, learned views are linearly combined and other properties of siRNAs are employed In particular, nucleotides A, C, G, and U of siRNAs at a position c (c = n) are represented by vectors (T1 [1, c], , Tk [1, c]), (T1 [1, c], , Tk [1, c]), (T1 [1, c], , Tk [1, c]), and (T1 [1, c], , Tk [1, c]), respectively If the GC content of siRNAs satisfies its condition (see Table 1), it is represented by the vector (T1 [1, n + 1], , Tk [1, n + 1]) In contrast, it is represented by (T1 [2, n + 1], , Tk [2, n + 1]) Four other characteristics of siRNAs are computed in the similar way In short, each siRNA sequences is encoded by a vector of k×(n+5) Moreover, other five properties of siRNAs (melting temperature, molecular weight, three thermodynamic properties consisting of enthalpy, entropy, and free energy ) are added to the final representation They are calculated by the nearest neighbor method [48] As a result, each siRNA is encode by a vector of k × (n + 5) + and U, respectively The rule at position 19 can be expressed into specific constrains on matrix T as follows • T [2, 19] − T [1, 19] < 0, i.e., A is effective than C • T [3, 19] − T [1, 19] < 0, i.e., A is effective than G • T [4, 19] − T [1, 19] < 0, i.e., A is effective than U • T [2, 19] − T [3, 19] < 0, i.e., C is ineffective than G • T [2, 19] − T [4, 19] < 0, i.e., C is ineffective than U Let Gi (i = k) be the set of specific constrains of rule ri on matrix Ti where each constraint of Gi is in the form (T [p, j] − T [q, j] < 0) where row p = and column j = n + Learning views: The siRNA set {s1 , s2 , , sm } will play as the training set to learn k views (optimize k matrices T1 , , Tk ) Learning views can be considered as a clustering problem[5] where k matrices are considered as centers of k clusters Each encoding matrix Mi of siRNA si is assigned to views Tj with a membership value uij (i = m; j = k) It means that siRNA sequences can be generated by different views at different confidences B Learning a predictive model This step will build a predictive model using the new representation of siRNAs The elastic net method [Zou, H et al., 2005] is applied to build the model for predicting knockdown efficacy of siRNAs This method is not only to build the model but also to select important features that effect to the target label In addition, based on the lasso regularization term of elastic net method, signification variables or important characteristics that influence the knockdown efficacy of siRNAs are detected We employ the FCM algorithm [2] with k clusters to optimize k views (matrices) and membership values by minimizing the following objective function III m k R= u2ij ||Mi − Tj ||2F ro This section presents experimental evaluation by comparing the proposed method MVRM (multiple view based regression model) with recent methods for siRNA knockdown efficacy prediction on four benchmark datasets (1) i=1 j=1 subject to: 1) 2) k constraint sets G1 , , Gk where Gi set of specific constrains of rule ri on matrix Ti m i=1 uij = 1, j = 1, k where || ◦ ||F ro is the Frobenius norm to calculate norm of a matrix Membership values and matrices can be solved by using an iterative method: each column of a matrix is derived while keeping the other ones The final solution is computed as follows: uij = k z=1 Tj [., c] = ||Mi −Tj ||F ro ||Mi −Tz ||F ro m i=1 u2ij Mi [., c] m i=1 uij E XPERIMENTAL E VALUATION (2) • The Huesken dataset of 2431 siRNA sequences targeting 34 human and rodent mRNAs, commonly divided into the training set HU train of 2182 siRNAs and the testing set HU test of 249 siRNAs [12] • The Reynolds dataset of 240 siRNAs [21] • The Vicker dataset of 76 siRNA sequences targeting two genes [29] • The Harborth dataset of 44 siRNA sequences targeting one gene [9] We employed five siRNA design rules (k = 5) to learn five views of siRNAs Specifically, the five design rules are Reynolds rule, Uitei rule, Amarzguioui rule, Jalag rule, Hsieh rule [21], [30], [1], [11], [15] The HU train set was used to learn these views and MVRM model The other datasets were used to comparative evaluation (3) 122 Algorithm Multi-view Learning Input: A dataset S = {s1 , s2 , , sm } where si , i = m are siRNA sequences of length n; a set R of k design rules tM ax is the number of iterations Output: k matrices (views) T1 , T2 , , Tk main Encode siRNAs by content for ri in R – Form the set of constraints Gi based on ri – Initialize the view Ti satisfying Gi end for t = { Iterative step} repeat t←t+1 {Compute membership values as follows} for i = to m for j = to k (t) Compute uij using equation (2) end for end for {Update views as follow} for j = to k for c = to n + Compute Tj [., c](t) using equation (3) if (Tj [., c](t) satisfies the constraints Gj ) then Tj [., c] ← Tj [., c](t) end if end for end for Tp(t) −Tp(t−1) F ro k ≤ and until (t−1) p=1 (t−1) u(t) qp −uqp (t−1) uqp Tp ≤ Fig Coefficients of the MVRM model show the importance of 125 features TABLE III Algorithm GPboot[42] Uitei[30] Amar [1] Hsieh[11] Takasaki[43] Reynolds 1[21] Reynolds 2[21] Schawarz[37] Khvorova[44] Stockholm 1[45] Stockholm 2[45] Tree[45] Luo[46] i-score[14] BIOPREDsi[12] DSIR[28] Katoh[47] SVM[23] BiLTR[3] MVRM model F ro p=k,q=m p=1,q=1 or (t > tM ax ) end main T HE R VALUES OF 18 MODELS AND INDEPENDENT DATA SETS RReynolds (244si/7g) 0.55 0.47 0.45 0.03 0.03 0.35 0.37 0.29 0.15 0.05 0.00 0.11 0.33 0.54 0.53 0.54 0.40 0.54 0.57 0.6 RV icker (76si/2g) 0.35 0.58 0.47 0.15 0.25 0.47 0.44 0.35 0.19 0.18 0.15 0.43 0.27 0.58 0.57 0.49 0.43 0.52 0.58 0.614 MVRM ON THREE RHarborth (44si/1g) 0.43 0.31 0.34 0.17 0.01 0.23 0.23 0.01 0.11 0.28 0.41 0.06 0.40 0.43 0.51 0.51 0.44 0.54 0.57 0.52 tal efficacy through cross validation We used five design rules and five other properties of siRNAs in learning our model so the final representation has × 24 + = 125 features After learning the model, 78 important features that influence the knockdown efficacy of siRNAs were chosen Figure describes the influencing ability of 125 features During the learning process, the coefficients of less important features are driven to zero Based on the coefficients of the MVRM model, important features can be easily selected in order to design effective siRNAs The MVRM model was compared to most of state–of–the– art methods For a fair comparison, we carried out experiments on MVRM in the same conditions as reported by other methods Concretely, the comparative evaluation is as follows Fig values Upper and lower curves of means squared error as a function of λ 1) The turning parameter of the objective function of the model was estimated by employing 10–fold cross validation Figure shows the curves of upper and lower bounds of mean squared error rates between predicted efficacy and experimen123 Comparison of MVRM with BIOPREDsi [12], Thermocomposition21 [24], DSIR [28], SVM [23] and , BiLTR [3] when trained on the HU train and tested on the HU test dataset The Pearson correlation coefficients of those five models are 0.66, 0.66, 0.67 and 0.80, 0.67 respectively The performance of MVRM estimated on the HU test is 0.66 The performance of 2) MVRM model is similar to that of other models but less than that of SVM model The reason is that SVM model uses positional features and 3D information This 3D feature captures the flexibility and strain of siRNAs that can be important characteristics for siRNAs of the HU test set extracted from human NCI–H1299, Hela genes and rodent genes [12] Comparison of MVRM with 19 models including BIOPREDsi, DSIR, SVM, and BiLTR when all of models were trained on the HU train set and tested on three independent datasets of Reynolds, Vicker and Harborth The Pearson correlation coefficients of MVRM model are 0.6, 0.614, and 0.52 when tested Reynolds, Vicker and Harborth datasets, respectively Table III shows that the MVRM considerably achieved results higher than the first 17 models It was better than SVM and BiLTR models when tested on the first two datasets The MVRM was not as good as BiLTR on the Harborth dataset However, one limitation of BiLTR model is computational cost to train transformation matrices and parameters It took about days to train BiLTR while only about five minutes to train MVRM model Besides that, unlike most of other models, the MVRM model produces the stable results across each of independent siRNA datasets experiments, the MVRM achieves the best results on the Reynolds and Vicker datasets Additionally, the performance of MVRM model is higher than that of the other models except the SVM and BiLTR models when tested on the Harborth dataset (Table III) ACKNOWLEDGMENT Bui Ngoc Thang and Le Sy Vinh are financially supported by Vietnam National Foundation for Science and Technology (102.01-2013.04) R EFERENCES [1] [2] [3] [4] [5] [6] [7] In these comparative studies, it was found that the performance of MVRM is more stable and higher than that of other models The reason is that previous siRNA representations can be unsuitable to represent siRNAs provided different groups under different protocols In our proposed method, the representation is enriched by incorporating background knowledge of siRNA design rules Therefore, it can capture the distribution diversity of siRNA data [8] [9] [10] As presented in the experimental comparative evaluation, MVRM achieved better results than most other methods in predicting siRNA knockdown efficacy IV [11] C ONCLUSION [12] In this paper, we have proposed a stable and accurate method to predict the knockdown efficacy of siRNA sequences In the model, to enrich siRNA representation, views of siRNAs are constructed and learned by incorporating background knowledge of available design rules By combining these views, an appropriate siRNA representation is also developed to represent siRNAs belonging to different distributions that are provided by research groups under different protocols [13] [14] [15] The experimental comparative evaluation on commonly used datasets with standard evaluation procedure in different contexts shows that the proposed method achieved promising results There are some reasons for that First, it is expensive to experimentally analyze the knockdown efficacy of siRNAs, and thus most of available datasets have relatively small size leading to limited results Second, MVRM has its advantages by incorporating domain knowledge (siRNA design rules) experimentally found from different datasets Third, MVRM is generic and can be easily exploited when new design rules are discovered When our proposed model was tested on the three independent datasets generated by different empirical [16] [17] [18] [19] 124 Amarzguioui M, Prydz H, An algorithm for selection of functional siRNA sequences, Biochem Biophys Res Commun, 316:1050–8, 2004 Bezdek JC, Ehrlich R, Full W, FCM: The fuzzy c-means clustering algorithm, Computers & Geosciences, 10 (2): 191–203, 1984 Bui TN, Ho TB, Tatsuo K, A semi-supervised tensor regression model for siRNA efficacy prediction, BMC Bioinformatics, 16: 80, 2015 Chang PC, Pan WJ, Chen CW, Chen YT, Chu YW, A design engine of siRNA that integrates SVMs prediction and feature filters, Biocatalysis and Agricultural Biotechnology, 1:129–134, 2012 Chang X., Dacheng T., Chao X., A Survey on Multi-view Learning, CoRR abs/1304.5634, 2013 Elbashir SM, Lendeckel W, Tuschl T, RNA interference is mediated by 21– and 22–nucleotide RNAs, Genes Dev, 2001, 15:188–200 Elbashir SM, Harborth J, Lendeckel W, Yalcin A, Klaus W, Tuschl T, Duplexes of 21-nucleotide RNAs mediate RNA interference in cultured mammalian cells, Nature, 411:494–498, 2001 Gong W, Ren Y, Xu Q, Wang Y, Lin D, Zhou H, Li T, Integrated siRNA design based on surveying of features associated with high RNAi effectiveness, BMC Bioinformatics, 7:516, 2006 Harborth J, Elbashir SM, Vandenburgh K, Manninga H, Scaringe SA, Weber K, Tuschl T, Sequence, chemical, and structural variation of small interfering RNAs and short hairpin RNAs and the effect on mammalian gene silencing, Antisense Nucleic Acid Drug Dev, 13:83–105, 2003 Hannon GJ, Rossi JJ, Unlocking the potential of the human genome with RNA interference, Nature, 43:371–378, 2004 Hsieh AC, Bo R, Manola J, Vazquez F, Bare O, Khvorova A, Scaringe S, Sellers WR, A library of siRNA duplexes targeting the phosphoinositide 3-kinase pathway: determinants of gene silencing for use in cell-based screens, Nucleic Acids Res, 32:893–901, 2004 Huesken D, Lange J, Mickanin C, Weiler J, Asselbergs F, Warner J, Mellon B, Engel S, Rosenberg A, Cohen D, Labow M, Reinhardt M, Natt F, Hall J, Design of a Genome–Wide siRNA Library Using an Artificial Neural Network, Nature Biotechnology, 23:955–1001, 2005 Hutvagner G, McLachlan J, Balint E, Tuschl T, Zamore PD, A cellular function for the RNA interference enzyme Dicer in small temporal RNA maturation, Science, 293:834–838, 2001 Ichihara M, Murakumo Y, Masuda A, Matsuura T, Asai N, Jijiwa M, Ishida M, Shinmi J, Yatsuya H, Qiao S et al., Thermodynamic instability of siRNA duplex is a prerequisite for dependable prediction of siRNA activities, Nucleic Acids Res, 35:e123, 2007 Jagla B, Aulner N, Kelly PD, Song D, Volchuk A, Zatorski A, Shum D, Mayer T, De Angelis DA, Ouerfelli O, Rutishauser U, Rothman JE, Sequence characteristics of functional siRNAs, RNA, 11:864–872, 2005 Klingelhoefer JW, Moutsianas L, Holmes CC, Approximate Bayesian feature selection on a large meta-dataset offers novel insights on factors that effect siRNA potency, Bioinformatics, 25:1594–1601, 2009 Meister G, Tuschl T, Mechanisms of gene silencing by double-stranded RNA, Nature, 43:343–349, 2004 Mysara M, Elhefnawi M, Garibaldi JM, MysiRNA: improving siRNA efficacy prediction using a machine-learning model combining multitools and whole stacking energy, J Biomed Inform, 45:528–34, 2012 Qiu S, Lane T, A Framework for Multiple Kernel Support Vector Regression and Its Applications to siRNA Efficacy Prediction, IEEE/ACM Trans Comput Biology Bioinform, 6:190–199, 2009 [20] [21] [22] [23] [24] [25] [26] [27] [28] [29] [30] [31] [32] [33] [34] [35] [36] [37] [38] [39] [40] [41] [42] [43] Ren Y, Gong W, Xu Q, Zheng X, Lin D, et al., siRecords: an extensive database of mammalian siRNAs with efficacy ratings, Bioinformatics, 22:1027–1028, 2006 Reynolds A, Leake D, Boese Q, Scaringe S, Marshall WS, Khvorova A, Rational siRNA design for RNA interference, Nat Biotechnol, 22:326– 330, 2004 Scherer LJ, Rossi JJ, Approaches for the sequence-specific knockdown of mRNA, Nat Biotechnol., 21:1457–1465, 2003 Sciabola S, Cao Q, Orozco M, Faustino I, Stanton RV, Improved nucleic acid descriptors for siRNA efficacy prediction, Nucl Acids Res, 41:1383– 1394, 2013 Shabalina SA, Spiridonov AN, Ogurtsov AY, Computational models with thermodynamic and composition features improve siRNA design, BMC Bioinformatics, 7:65, 2006 Sudarsana LR, Sarojamma V, Ramakrishna V, Future of RNAi in medicine: a review, World J Med Sci, 2:1–14, 2007 Takasaki S, Methods for Selecting Effective siRNA Target Sequences Using a Variety of Statistical and Analytical Techniques, Methods Mol Biol, 942:17–55, 2013 Tuschl T, Zamore PD, Lehmann R, Bartel DP, Sharp PA, Targeted mRNA degradation by double-stranded RNA in vitro, Genes Dev., 13:3191–3197, 1999 Vert JP, Foveau N, Lajaunie C, Vandenbrouck Y, An accurate and interpretable model for siRNA efficacy prediction, BMC Bioinformatics, 7:520, 2006 Vickers TA, Koo S, Bennett CF, Crooke ST, Dean NM, Baker BF, Efficient reduction of target RNAs by small interfering RNA and RNase H-dependent antisense agents A comparative analysis, J Biol Chem., 278:7108–7118, 2003 Ui-Tei K, Naito Y, Takahashi F, Haraguchi T, Ohki–Hamazaki H, Juni A, Ueda R, Saigo K, Guidelines for the selection of highly effective siRNA sequences for mammalian and chick RNA interference, Nucleic Acids Res., 32:936–948, 2004 Ui–Tei K: Optimal choice of functional and off–target effect–reduced siRNAs for RNAi therapeutics, Front Genet, 4:107, 2013 Angart P, Vocelle D, Chan C, Walton SP, Design of siRNA therapeutics from the molecular scale, Pharmaceuticals, 6:440–468, 2013 Gavrilov K, Saltzman WM, Therapeutic siRNA: principles, challenges, and strategies, Yale J Biol Med., 85:187–200, 2012 Mutisya D, Selvam C, Lunstad BD, Pallan PS, Haas A, Leake D, Egli M, Rozners E, Amides are excellent mimics of phosphate internucleoside linkages and are well tolerated in short interfering RNAs, Nucleic Acids Res, 42(10):6542–51, 2014 Deng Y, Wang CC, Choy KW, Du Q, Chen J, Wang Q, Li L, Chung TK, Tang T, Therapeutic potentials of gene silencing by RNA interference: principles, challenges, and new strategies, Gene, 538(2):217–27, 2014 Matveeva O, Nechipurenko Y, Rossi L, Moore B, Ogurtsov AY, Atkins JF, et al., Comparison of approaches for rational siRNA design leading to a new efficient and transparent method, Access, 35:1–10, 2007 Schwarz DS, Hutvagner G, Du T, Xu Z, Aronin N, Zamore PD, Asymmetry in the assembly of the RNAi enzyme complex, Cell, 115(2):199– 208, 2003 Khvorova A, Reynolds A, Jayasena SD, Functional siRNAs and miRNAs exhibit strand bias, Cell, 115(2):209–216, 2003 Schramm G, Ramey R, siRNA design including secondary structure target site prediction, Nature Medicine, 2(8) doi: 10.1038/nmeth780, 2005 (Application Notes) Hannon GJ, Rossi JJ, Unlocking the potential of the human genome with RNA interference, Nature, 431:371–378, 2004 Qi L, Han Z, Ruixin Z, Ying X, and Zhiwei C, Reconsideration of in silico siRNA design from a perspective of heterogeneous data integration: problems and solutions, Brief Bioinform., 15:292–305, 2014 Saetrom P, Predicting the efficacy of short oligonucleotides in antisense and RNAi experiments with boosted genetic programming, Bioinformatics, 20(17):3055–3063, 2004 Takasaki S, Kotani S, Konagaya A, An effective method for selecting siRNA target sequences in mammalian cells, Cell Cycle, 3(6):790–5, 2004 [44] [45] [46] [47] [48] [49] [50] 125 Khvorova A, Reynolds A, Jayasena SD, Functional siRNAs and miRNAs exhibit strand bias, Cell, 115:209–216, 2003 Chalk A, Wahlestedt C, Sonnhammer E, Improved and automated prediction of effective siRNA, Biochem Biophys Res Commun., 319(1):264– 274, 2004 Luo K, Chang D, The gene–silencing efficiency of siRNA is strongly dependent on the local structure of mRNA at the targeted region, Biochem Biophys Res Commun, 318 (1):303–310, 2004 KatohT, Suzuki T, Specific residues at every third position of siRNA shape its efficient RNAi activity, Nucleic Acids Res, 35:e27, 2007 SantaLucia Jr., J., A unified view of polymer, dumbbell, and oligonucleotide DNA nearest–neighbor thermodynamics, Proceedings of the National Academy of Science USA, 95 :1460–1465, 1998 Zou, H., Hastie T., Regularization and variable selection via the elastic net, Journal of the Royal Statistical Society, Series B, 67(2): 301–320, 2005 H Kopka and P W Daly, A Guide to LATEX, 3rd ed Harlow, England: Addison-Wesley, 1999 ... encoding matrix M of an siRNA sequence of 19 nucleotides GAAAGGAAUUGUAUAAAUC” is described in Table II In this paper, we develop a hybrid approach, named MVRM, to predict the siRNA knockdown efficacy... datasets The MVRM was not as good as BiLTR on the Harborth dataset However, one limitation of BiLTR model is computational cost to train transformation matrices and parameters It took about days... knockdown efficacy of siRNAs, and thus most of available datasets have relatively small size leading to limited results Second, MVRM has its advantages by incorporating domain knowledge (siRNA design

Định dạng
Số trang	6
Dung lượng	210,79 KB