1. Trang chủ
  2. » Công Nghệ Thông Tin

Data Mining and Knowledge Discovery Handbook, 2 Edition part 20 ppt

10 393 0

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 10
Dung lượng 454,49 KB

Nội dung

170 Lior Rokach and Oded Maimon such as: supervised learning lr6,lr12, lr15, unsupervised learning lr13,lr8,lr5,lr16 and genetic algorithms lr17,lr11,lr1,lr4. References Almuallim H., An Efficient Algorithm for Optimal Pruning of Decision Trees. Artificial Intelligence 83(2): 347-362, 1996. Almuallim H,. and Dietterich T.G., Learning Boolean concepts in the presence of many irrelevant features. Artificial Intelligence, 69: 1-2, 279-306, 1994. Alsabti K., Ranka S. and Singh V., CLOUDS: A Decision Tree Classifier for Large Datasets, Conference on Knowledge Discovery and Data Mining (KDD-98), August 1998. Attneave F., Applications of Information Theory to Psychology. Holt, Rinehart and Winston, 1959. Arbel, R. and Rokach, L., Classifier evaluation under limited resources, Pattern Recognition Letters, 27(14): 1619–1631, 2006, Elsevier. Averbuch, M. and Karson, T. and Ben-Ami, B. and Maimon, O. and Rokach, L., Context- sensitive medical information retrieval, The 11th World Congress on Medical Informat- ics (MEDINFO 2004), San Francisco, CA, September 2004, IOS Press, pp. 282–286. Baker E., and Jain A. K., On feature ordering in practice and some finite sample effects. In Proceedings of the Third International Joint Conference on Pattern Recognition, pages 45-49, San Diego, CA, 1976. BenBassat M., Myopic policies in sequential classification. IEEE Trans. on Computing, 27(2):170-174, February 1978. Bennett X. and Mangasarian O.L., Multicategory discrimination via linear programming. Optimization Methods and Software, 3:29-39, 1994. Bratko I., and Bohanec M., Trading accuracy for simplicity in decision trees, Machine Learn- ing 15: 223-250, 1994. Breiman L., Friedman J., Olshen R., and Stone C Classification and Regression Trees. Wadsworth Int. Group, 1984. Brodley C. E. and Utgoff. P. E., Multivariate decision trees. Machine Learning, 19:45-77, 1995. Buntine W., Niblett T., A Further Comparison of Splitting Rules for Decision-Tree Induction. Machine Learning, 8: 75-85, 1992. Catlett J., Mega induction: Machine Learning on Vary Large Databases, PhD, University of Sydney, 1991. Chan P.K. and Stolfo S.J, On the Accuracy of Meta-learning for Scalable Data Mining, J. Intelligent Information Systems, 8:5-28, 1997. Cohen S., Rokach L., Maimon O., Decision Tree Instance Space Decomposition with Grouped Gain-Ratio, Information Science, Volume 177, Issue 17, pp. 3592-3612, 2007. Crawford S. L., Extensions to the CART algorithm. Int. J. of ManMachine Studies, 31(2):197-217, August 1989. Dietterich, T. G., Kearns, M., and Mansour, Y., Applying the weak learning framework to understand and improve C4.5. Proceedings of the Thirteenth International Conference on Machine Learning, pp. 96-104, San Francisco: Morgan Kaufmann, 1996. Duda, R., and Hart, P., Pattern Classification and Scene Analysis, New-York, Wiley, 1973. Esposito F., Malerba D. and Semeraro G., A Comparative Analysis of Methods for Prun- ing Decision Trees. EEE Transactions on Pattern Analysis and Machine Intelligence, 19(5):476-492, 1997. 9 Classification Trees 171 Fayyad U., and Irani K. B., The attribute selection problem in decision tree generation. In proceedings of Tenth National Conference on Artificial Intelligence, pp. 104–110, Cam- bridge, MA: AAAI Press/MIT Press, 1992. Ferri C., Flach P., and Hern ´ andez-Orallo J., Learning Decision Trees Using the Area Under the ROC Curve. In Claude Sammut and Achim Hoffmann, editors, Proceedings of the 19th International Conference on Machine Learning, pp. 139-146. Morgan Kaufmann, July 2002 Fifield D. J., Distributed Tree Construction From Large Datasets, Bachelor’s Honor Thesis, Australian National University, 1992. Freitas X., and Lavington S. H., Mining Very Large Databases With Parallel Processing, Kluwer Academic Publishers, 1998. Friedman J. H., A recursive partitioning decision rule for nonparametric classifiers. IEEE Trans. on Comp., C26:404-408, 1977. Friedman, J. H., “Multivariate Adaptive Regression Splines”, The Annual Of Statistics, 19, 1-141, 1991. Gehrke J., Ganti V., Ramakrishnan R., Loh W., BOAT-Optimistic Decision Tree Construc- tion. SIGMOD Conference 1999: pp. 169-180, 1999. Gehrke J., Ramakrishnan R., Ganti V., RainForest - A Framework for Fast Decision Tree Construction of Large Datasets,Data Mining and Knowledge Discovery, 4, 2/3) 127-162, 2000. Gelfand S. B., Ravishankar C. S., and Delp E. J., An iterative growing and pruning algo- rithm for classification tree design. IEEE Transaction on Pattern Analysis and Machine Intelligence, 13(2):163-174, 1991. Gillo M. W., MAID: A Honeywell 600 program for an automatised survey analysis. Behav- ioral Science 17: 251-252, 1972. Hancock T. R., Jiang T., Li M., Tromp J., Lower Bounds on Learning Decision Lists and Trees. Information and Computation 126(2): 114-122, 1996. Holte R. C., Very simple classification rules perform well on most commonly used datasets. Machine Learning, 11:63-90, 1993. Hyafil L. and Rivest R.L., Constructing optimal binary decision trees is NP-complete. Infor- mation Processing Letters, 5(1):15-17, 1976 Janikow, C.Z., Fuzzy Decision Trees: Issues and Methods, IEEE Transactions on Systems, Man, and Cybernetics, Vol. 28, Issue 1, pp. 1-14. 1998. John G. H., Robust linear discriminant trees. In D. Fisher and H. Lenz, editors, Learning From Data: Artificial Intelligence and Statistics V, Lecture Notes in Statistics, Chapter 36, pp. 375-385. Springer-Verlag, New York, 1996. Kass G. V., An exploratory technique for investigating large quantities of categorical data. Applied Statistics, 29(2):119-127, 1980. Kearns M. and Mansour Y., A fast, bottom-up decision tree pruning algorithm with near- optimal generalization, in J. Shavlik, ed., ‘Machine Learning: Proceedings of the Fif- teenth International Conference’, Morgan Kaufmann Publishers, Inc., pp. 269-277, 1998. Kearns M. and Mansour Y., On the boosting ability of top-down decision tree learning algo- rithms. Journal of Computer and Systems Sciences, 58(1): 109-128, 1999. Kohavi R. and Sommerfield D., Targeting business users with decision table classifiers, in R. Agrawal, P. Stolorz & G. Piatetsky-Shapiro, eds, ‘Proceedings of the Fourth Interna- tional Conference on Knowledge Discovery and Data Mining’, AAAI Press, pp. 249- 253, 1998. Langley, P. and Sage, S., Oblivious decision trees and abstract cases. in Working Notes of the AAAI-94 Workshop on Case-Based Reasoning, pp. 113-117, Seattle, WA: AAAI Press, 1994. 172 Lior Rokach and Oded Maimon Li X. and Dubes R. C., Tree classifier design with a Permutation statistic, Pattern Recognition 19:229-235, 1986. Lim X., Loh W.Y., and Shih X., A comparison of prediction accuracy, complexity, and train- ing time of thirty-three old and new classification algorithms . Machine Learning 40:203- 228, 2000. Lin Y. K. and Fu K., Automatic classification of cervical cells using a binary tree classifier. Pattern Recognition, 16(1):69-80, 1983. Loh W.Y.,and Shih X., Split selection methods for classification trees. Statistica Sinica, 7: 815-840, 1997. Loh W.Y. and Shih X., Families of splitting criteria for classification trees. Statistics and Computing 9:309-315, 1999. Loh W.Y. and Vanichsetakul N., Tree-structured classification via generalized discriminant Analysis. Journal of the American Statistical Association, 83: 715-728, 1988. Lopez de Mantras R., A distance-based attribute selection measure for decision tree induc- tion, Machine Learning 6:81-92, 1991. Lubinsky D., Algorithmic speedups in growing classification trees by using an additive split criterion. Proc. AI&Statistics93, pp. 435-444, 1993. Maimon O., and Rokach, L. Data Mining by Attribute Decomposition with semiconductors manufacturing case study, in Data Mining for Design and Manufacturing: Methods and Applications, D. Braha (ed.), Kluwer Academic Publishers, pp. 311–336, 2001. Maimon O. and Rokach L., “Improving supervised learning by feature decomposition”, Pro- ceedings of the Second International Symposium on Foundations of Information and Knowledge Systems, Lecture Notes in Computer Science, Springer, pp. 178-196, 2002. Maimon, O. and Rokach, L., Decomposition Methodology for Knowledge Discovery and Data Mining: Theory and Applications, Series in Machine Perception and Artificial In- telligence - Vol. 61, World Scientific Publishing, ISBN:981-256-079-3, 2005. Martin J. K., An exact probability metric for decision tree splitting and stopping. An Exact Probability Metric for Decision Tree Splitting and Stopping, Machine Learning, 28, 2- 3):257-291, 1997. Mehta M., Rissanen J., Agrawal R., MDL-Based Decision Tree Pruning. KDD 1995: pp. 216-221, 1995. Mehta M., Agrawal R. and Rissanen J., SLIQ: A fast scalable classifier for Data Mining: In Proc. If the fifth Int’l Conference on Extending Database Technology (EDBT), Avignon, France, March 1996. Mingers J., An empirical comparison of pruning methods for decision tree induction. Ma- chine Learning, 4(2):227-243, 1989. Morgan J. N. and Messenger R. C., THAID: a sequential search program for the analysis of nominal scale dependent variables. Technical report, Institute for Social Research, Univ. of Michigan, Ann Arbor, MI, 1973. Moskovitch R, Elovici Y, Rokach L, Detection of unknown computer worms based on behav- ioral classification of the host, Computational Statistics and Data Analysis, 52(9):4544– 4566, 2008. Muller W., and Wysotzki F., Automatic construction of decision trees for classification. An- nals of Operations Research, 52:231-247, 1994. Murthy S. K., Automatic Construction of Decision Trees from Data: A MultiDisciplinary Survey. Data Mining and Knowledge Discovery, 2(4):345-389, 1998. 9 Classification Trees 173 Naumov G.E., NP-completeness of problems of construction of optimal decision trees. So- viet Physics: Doklady, 36(4):270-271, 1991. Niblett T. and Bratko I., Learning Decision Rules in Noisy Domains, Proc. Expert Systems 86, Cambridge: Cambridge University Press, 1986. Olaru C., Wehenkel L., A complete fuzzy decision tree technique, Fuzzy Sets and Systems, 138(2):221–254, 2003. Pagallo, G. and Huassler, D., Boolean feature discovery in empirical learning, Machine Learning, 5(1): 71-99, 1990. Peng Y., Intelligent condition monitoring using fuzzy inductive learning, Journal of Intelli- gent Manufacturing, 15 (3): 373-380, June 2004. Quinlan, J.R., Induction of decision trees, Machine Learning 1, 81-106, 1986. Quinlan, J.R., Simplifying decision trees, International Journal of Man- Machine Studies, 27, 221-234, 1987. Quinlan, J.R., Decision Trees and Multivalued Attributes, J. Richards, ed., Machine Intelli- gence, V. 11, Oxford, England, Oxford Univ. Press, pp. 305-318, 1988. Quinlan, J. R., Unknown attribute values in induction. In Segre, A. (Ed.), Proceedings of the Sixth International Machine Learning Workshop Cornell, New York. Morgan Kaufmann, 1989. Quinlan, J. R., C4.5: Programs for Machine Learning, Morgan Kaufmann, Los Altos, 1993. Quinlan, J. R. and Rivest, R. L., Inferring Decision Trees Using The Minimum Description Length Principle. Information and Computation, 80:227-248, 1989. Rastogi, R., and Shim, K., PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning,Data Mining and Knowledge Discovery, 4(4):315-344, 2000. Rissanen, J., Stochastic complexity and statistical inquiry. World Scientific, 1989. Rokach, L., Decomposition methodology for classification tasks: a meta decomposer frame- work, Pattern Analysis and Applications, 9(2006):257–271. Rokach L., Genetic algorithm-based feature set partitioning for classification prob- lems,Pattern Recognition, 41(5):1676–1700, 2008. Rokach L., Mining manufacturing data using genetic algorithm-based feature set decompo- sition, Int. J. Intelligent Systems Technologies and Applications, 4(1):57-78, 2008. Rokach, L. and Maimon, O., Theory and applications of attribute decomposition, IEEE In- ternational Conference on Data Mining, IEEE Computer Society Press, pp. 473–480, 2001. Rokach L. and Maimon O., Feature Set Decomposition for Decision Trees, Journal of Intel- ligent Data Analysis, Volume 9, Number 2, 2005b, pp 131–158. Rokach, L. and Maimon, O., Clustering methods, Data Mining and Knowledge Discovery Handbook, pp. 321–352, 2005, Springer. Rokach, L. and Maimon, O., Data mining for improving the quality of manufacturing: a feature set decomposition approach, Journal of Intelligent Manufacturing, 17(3):285– 299, 2006, Springer. Rokach, L., Maimon, O., Data Mining with Decision Trees: Theory and Applications, World Scientific Publishing, 2008. Rokach L., Maimon O. and Lavi I., Space Decomposition In Data Mining: A Clustering Ap- proach, Proceedings of the 14th International Symposium On Methodologies For Intel- ligent Systems, Maebashi, Japan, Lecture Notes in Computer Science, Springer-Verlag, 2003, pp. 24–31. Rokach, L. and Maimon, O. and Averbuch, M., Information Retrieval System for Medical Narrative Reports, Lecture Notes in Artificial intelligence 3055, page 217-228 Springer- Verlag, 2004. 174 Lior Rokach and Oded Maimon Rokach, L. and Maimon, O. and Arbel, R., Selective voting-getting more for less in sensor fusion, International Journal of Pattern Recognition and Artificial Intelligence 20 (3) (2006), pp. 329–350. Rounds, E., A combined non-parametric approach to feature selection and binary decision tree design, Pattern Recognition 12, 313-317, 1980. Schlimmer, J. C. , Efficiently inducing determinations: A complete and systematic search al- gorithm that uses optimal pruning. In Proceedings of the 1993 International Conference on Machine Learning: pp 284-290, San Mateo, CA, Morgan Kaufmann, 1993. Sethi, K., and Yoo, J. H., Design of multicategory, multifeature split decision trees using perceptron learning. Pattern Recognition, 27(7):939-947, 1994. Shafer, J. C., Agrawal, R. and Mehta, M. , SPRINT: A Scalable Parallel Classifier for Data Mining, Proc. 22nd Int. Conf. Very Large Databases, T. M. Vijayaraman and Alejandro P. Buchmann and C. Mohan and Nandlal L. Sarda (eds), 544-555, Morgan Kaufmann, 1996. Sklansky, J. and Wassel, G. N., Pattern classifiers and trainable machines. SpringerVerlag, New York, 1981. Sonquist, J. A., Baker E. L., and Morgan, J. N., Searching for Structure. Institute for Social Research, Univ. of Michigan, Ann Arbor, MI, 1971. Taylor P. C., and Silverman, B. W., Block diagrams and splitting criteria for classification trees. Statistics and Computing, 3(4):147-161, 1993. Utgoff, P. E., Perceptron trees: A case study in hybrid concept representations. Connection Science, 1(4):377-391, 1989. Utgoff, P. E., Incremental induction of decision trees. Machine Learning, 4: 161-186, 1989. Utgoff, P. E., Decision tree induction based on efficient tree restructuring, Machine Learning 29, 1):5-44, 1997. Utgoff, P. E., and Clouse, J. A., A Kolmogorov-Smirnoff Metric for Decision Tree Induction, Technical Report 96-3, University of Massachusetts, Department of Computer Science, Amherst, MA, 1996. Wallace, C. S., and Patrick J., Coding decision trees, Machine Learning 11: 7-22, 1993. Zantema, H., and Bodlaender H. L., Finding Small Equivalent Decision Trees is Hard, International Journal of Foundations of Computer Science, 11(2): 343-354, 2000. 10 Bayesian Networks Paola Sebastiani 1 , Maria M. Abad 2 , and Marco F. Ramoni 3 1 Department of Biostatistics Boston University sebas@bu.edu 2 Software Engineering Department University of Granada, Spain mabad@ugr.es 3 Departments of Pediatrics and Medicine Harvard University marco ramoni@harvard.edu Summary. Bayesian networks are today one of the most promising approaches to Data Min- ing and knowledge discovery in databases. This chapter reviews the fundamental aspects of Bayesian networks and some of their technical aspects, with a particular emphasis on the methods to induce Bayesian networks from different types of data. Basic notions are illus- trated through the detailed descriptions of two Bayesian network applications: one to survey data and one to marketing data. Key words: Bayesian networks, probabilistic graphical models, machine learning, statistics. 10.1 Introduction Born at the intersection of Artificial Intelligence, statistics and probability, Bayesian networks (Pearl, 1988) are a representation formalism at the cutting edge of knowl- edge discovery and Data Mining (Heckerman, 1997, Madigan and Ridgeway, 2003, Madigan and York, 1995). Bayesian networks belong to a more general class of mod- els called probabilistic graphical models (Whittaker, 1990,Lauritzen, 1996) that arise from the combination of graph theory and probability theory and their success rests on their ability to handle complex probabilistic models by decomposing them into smaller, amenable components. A probabilistic graphical model is defined by a graph where nodes represent stochastic variables and arcs represent dependencies among such variables. These arcs are annotated by probability distribution shaping the in- teraction between the linked variables. A probabilistic graphical model is called a Bayesian network when the graph connecting its variables is a directed acyclic graph (DAG). This graph represents conditional independence assumptions that are used to factorize the joint probability distribution of the network variables thus making the process of learning from large database amenable to computations. A Bayesian network induced from data can be used to investigate distant relationships between O. Maimon, L. Rokach (eds.), Data Mining and Knowledge Discovery Handbook, 2nd ed., DOI 10.1007/978-0-387-09823-4_10, © Springer Science+Business Media, LLC 2010 176 Paola Sebastiani, Maria M. Abad, and Marco F. Ramoni variables, as well as making prediction and explanation, by computing the condi- tional probability distribution of one variable, given the values of some others. The origins of Bayesian networks can be traced back as far as the early decades of the 20th century, when Sewell Wright developed path analysis to aid the study of genetic inheritance (Wright, 1923,Wright, 1934). In their current form, Bayesian net- works were introduced in the early 80s as a knowledge representation formalism to encode and use the information acquired from human experts in automated reasoning systems to perform diagnostic, predictive, and explanatory tasks (Pearl, 1988, Char- niak, 1991). Their intuitive graphical nature and their principled probabilistic foun- dations were very attractive features to acquire and represent information burdened by uncertainty. The development of amenable algorithms to propagate probabilistic information through the graph (Lauritzen and Spiegelhalter, 1988, Pearl, 1988) put Bayesian networks at the forefront of Artificial Intelligence research. Around same time, the machine learning community came to the realization that the sound prob- abilistic nature of Bayesian networks provided straightforward ways to learn them from data. As Bayesian networks encode assumptions of conditional independence, the first machine learning approaches to Bayesian networks consisted of searching for conditional independence structures in the data and encoding them as a Bayesian network (Glymour et al., 1987, Pearl, 1988). Shortly thereafter, Cooper and Her- skovitz (Cooper and Herskovitz, 1992) introduced a Bayesian method, further re- fined by (Heckerman et al., 1995), to learn Bayesian networks from data. These re- sults spurred the interest of the Data Mining and knowledge discovery community in the unique features of Bayesian networks (Heckerman, 1997): a highly symbolic for- malism, originally developed to be used and understood by humans, well-grounded on the sound foundations of statistics and probability theory, able to capture complex interaction mechanisms and to perform prediction and classification. 10.2 Representation A Bayesian network has two components: a directed acyclic graph and a probability distribution. Nodes in the directed acyclic graph represent stochastic variables and arcs represent directed dependencies among variables that are quantified by condi- tional probability distributions. As an example, consider the simple scenario in which two variables control the value of a third. We denote the three variables with the letters A, B and C, and we assume that each is bearing two states: “True” and “False”. The Bayesian network in Figure 10.1 describes the dependency of the three variables with a directed acyclic graph, in which the two arcs pointing to the node C represent the joint action of the two variables A and B. Also, the absence of any directed arc between A and B describes the marginal independence of the two variables that become dependent when we condition on the phenotype. Following the direction of the arrows, we call the node C a child of A and B, which become its parents. The Bayesian network in Figure 10.1 let us decompose the overall joint probability distribution of the three variables that would consist of 2 3 −1 = 7 parameters into three probability distri- 10 Bayesian Networks 177 Fig. 10.1. A network describing the impact of two variables (nodes A and B) on a third one (node C). Each node in the network is associated with a probability table that describes the conditional distribution of the node, given its parents. butions, one conditional distribution for the variable C given the parents, and two marginal distributions for the two parent variables A and B. These probabilities are specified by 1 +1 + 4 = 6 parameters. The decomposition is one of the key factors to provide both a verbal and a human understandable description of the system and to efficiently store and handle this distribution, which grows exponentially with the number of variables in the domain. The second key factor is the use of conditional independence between the network variables to break down their overall distribution into connected modules. Suppose we have three random variables Y 1 ,Y 2 ,Y 3 . Then Y 1 and Y 2 are indepen- dent given Y 3 if the conditional distribution of Y 1 ,givenY 2 ,Y 3 is only a function of Y 3 . Formally: p(y 1 |y 2 ,y 3 )=p(y 1 |y 3 ) where p(y|x) denotes the conditional probability/density of Y ,givenX = x. We use capital letters to denote random variables, and small letters to denote their values. We also use the notation Y 1 ⊥Y 2 |Y 3 to denote the conditional independence of Y 1 and Y 2 given Y 3 . Conditional and marginal independence are substantially different concepts. For example two variables can be marginally independent, but they may be dependent when we condition on a third variable. The directed acyclic graph in Figure 10.1 shows this property: the two parent variables are marginally independent, but they 178 Paola Sebastiani, Maria M. Abad, and Marco F. Ramoni become dependent when we condition on their common child. A well known con- sequence of this fact is the Simpson’s paradox (Whittaker, 1990) : two variables are independent but once a shared child variable is observed they become dependent. Fig. 10.2. A network encoding the conditional independence of Y 1 ,Y 2 given the common par- ent Y 3 . The panel in the middle shows that the distribution of Y 2 changes with Y 1 and hence the two variables are conditionally dependent. Conversely, two variables that are marginally dependent may be made condi- tionally independent by introducing a third variable. This situation is represented by the directed acyclic graph in Figure 10.2, which shows two children nodes (Y 1 and Y 2 ) with a common parent Y 3 . In this case, the two children nodes are independent, given the common parent, but they may become dependent when we marginalize the common parent out. The overall list of marginal and conditional independencies represented by the di- rected acyclic graph is summarized by the local and global Markov properties (Lau- ritzen, 1996) that are exemplified in Figure 10.3 using a network of seven variables. The local Markov property states that each node is independent of its non descendant given the parent nodes and leads to a direct factorization of the joint distribution of the network variables into the product of the conditional distribution of each vari- able Y i given its parents Pa(y i ). Therefore, the joint probability (or density) of the v network variables can be written as: p(y 1 , ,y v )= ∏ i p(y i |pa(y i )). (10.1) In this equation, pa(y i ) denotes a set of values of Pa(Y i ). This property is the core of many search algorithms for learning Bayesian networks from data. With this de- 10 Bayesian Networks 179 Fig. 10.3. A Bayesian network with seven variables and some of the Markov properties repre- sented by its directed acyclic graph. The panel on the left describes the local Markov property encoded by a directed acyclic graph and lists the three Markov properties that are represented by the graph in the middle. The panel on the right describes the global Markov property and lists three of the seven global Markov properties represented by the graph in the middle. The vector in bold denotes the set of variables represented by the nodes in the graph. composition, the overall distribution is broken into modules that can be interrelated, and the network summarizes all significant dependencies without information disin- tegration. Suppose, for example, the variable in the network in Figure 10.3 are all categorical. Then the joint probability p(y 1 , ,y 7 ) can be written as the product of seven conditional distributions: p(y 1 )p(y 2 )p(y 3 |y 1 ,y 2 )p(y 4 )p(y 5 |y 3 )p(y 6 |y 3 ,y 4 )p(y 7 |y 5 ,y 6 ). The global Markov property, on the other hand, summarizes all conditional indepen- dencies embedded by the directed acyclic graph by identifying the Markov Blanket of each node (Figure 10.3). 10.3 Reasoning The modularity induced by the Markov properties encoded by the directed acyclic graph is the core of many search algorithms for learning Bayesian networks from data. By the Markov properties, the overall distribution is broken into modules that can be interrelated, and the network summarizes all significant dependencies with- out information disintegration. In the network in Figure 10.3, for example, we can compute the probability distribution of the variable Y 7 , given that the variable Y 1 is observed to take a particular value (prediction) or, vice versa, we can compute the conditional distribution of Y 1 given the values of some other variables in the network (explanation). In this way, a Bayesian network becomes a complete simulation sys- tem able to forecast the value of unobserved variables under hypothetical conditions and, conversely, able to find the most probable set of initial conditions leading to observed situation. . Number 2, 20 05b, pp 131–158. Rokach, L. and Maimon, O., Clustering methods, Data Mining and Knowledge Discovery Handbook, pp. 321 –3 52, 20 05, Springer. Rokach, L. and Maimon, O., Data mining for. Tree Construction of Large Datasets ,Data Mining and Knowledge Discovery, 4, 2/ 3) 127 -1 62, 20 00. Gelfand S. B., Ravishankar C. S., and Delp E. J., An iterative growing and pruning algo- rithm for. Information and Knowledge Systems, Lecture Notes in Computer Science, Springer, pp. 178-196, 20 02. Maimon, O. and Rokach, L., Decomposition Methodology for Knowledge Discovery and Data Mining: Theory and

Ngày đăng: 04/07/2014, 05:21

TỪ KHÓA LIÊN QUAN