200 Paola Sebastiani, Maria M. Abad, and Marco F. Ramoni Variable Description Type State description Region Hoh birth region Nominal England, Scotland and Wales Ad fems No of adult females Ordinal 0, 1, ≥ 2 Ad males No of adult males Ordinal 0, 1, ≥2 Children No of children Ordinal 0, 1, 2, 3, ≥4 Hoh age Age of Hoh Numeric 17-36; 36-50; 50-66; 66-98 Hoh gend Gender of Hoh Nominal M, F Accomod Accommodation Nominal Room, Flat, House, Other Bedrms No of bedrooms Ordinal 1, 2, 3, ≥4 Ncars No of cars Ordinal 1, 2, 3, ≥4 Tenure House status Nominal Rent, Owned, Soc-Sector Hoh reslen Length of residence Numeric 0-3; 3-9; 9-19; ≥ 19 (months) Hoh origin Hoh ethnicity Nominal Caucas., Black, Chin., Indian, Other Hoh status Status of Hoh Nominal Active, Inactive, Retired Table 10.2. Description of the variables used in the analysis. Hoh denotes the Head of the Household. Numbers of adult males, females and children refer to the household. of the household increases. The dependency of the gender of the household head on the ethnic group shows that Blacks have the smallest probability of having a male head of the household (64%) while Indians have the largest probability (89%). Other interesting discoveries are that the age of the head of the household depends directly on the number of adult males and females and shows that households with no fe- males and two or more males are more likely to be headed by a young male, while on the other hand, households with no males and two or more females are headed by a mid age female. There appear to be more single households headed by an elder female than an elder male. Also the composition of the household changes in the ethnic groups and Indians have the smallest probability of living in a household with no adult males (10%), while Blacks have the largest probability (32%). By propagating the network, one may investigate other undirected associations and discover that, for example, the typical Caucasian mid family with two children has 77% chance of being headed by a male who, with probability .57, is aged be- tween 36 and 50 years. The probability that the head of the household is active is .84, and the probability that the household is in an owned house is .66. Results of these queries are displayed in Figure 10.11. These figures are slightly different if the head of the household is, for example, Black and the probability that the head of the household is male (given that there are two children in the household) is only .62 and the probability that he is active is .79. If the head of the household is Indian, then the probability that he is male is .90, and the probability that he is active is .88. On average, the ethnic group changes slightly the probability of the household being in an accommodation provided by the social service (26% for Blacks, 23% for Chinese, 20% Indians and 24% Caucasians). Similarly, Black household heads are more likely to be inactive than household heads from different ethnic groups (16% Blacks, 10% Indians, 14% Caucasians and Chinese) and to be living in a less wealthy household, as shown by the larger probability of living in accommodations with a smaller num- 10 Bayesian Networks 201 Fig. 10.11. An example of probabilistic reasoning using the Bayesian network induced from the 13 variables extracted from the 1996 General Household Survey. ber of bedrooms and of having a smaller number of cars. The overall picture is that of households headed by a Black to be less wealthy than others, and this would be the conclusions one reaches if the gender of the head of the household is not taken into account. However, the dependency structure discovered shows that the gender of the head of the household and the number of adult females make all the other variables independent of the ethnic group. Thus, the extracted model supports the hypothesis that differences in the household wealth are more likely explained by the different household composition, and in particular by the gender of the head of the household, rather than racial factors. 10.6.2 Customer Profiling A typical problem of direct mail fund raising campaigns is the low response rate. Recent studies have shown that adding incentives or gifts in the mailing can increase the response rate. This is the strategy implemented by an American Charity in the June ’97 renewal campaign. The mailing included a gift of personalized name and address labels plus an assortment of 10 note cards and envelopes. Each mail cost the charity 0.68 dollars and resulted in a response rate of about 5% in the group of so called lapsed donors, that is, individuals who made their last donation more than a year before the ’97 renewal mail. Since the donations received by the respondents ranged between 2 and 200 dollars, and the median donation was 13 dollars, the fund raiser needed to decide when it was worth sending the renewal mail to a donor, on 202 Paola Sebastiani, Maria M. Abad, and Marco F. Ramoni the basis of the information available about him from the in-house database. Fur- thermore, the charity was interested in strategies to recapture Lapsed Donors and, therefore, in making a profile from which to understand motivations behind their lack of response. We addressed these issues in (Sebastiani et al., 2000) by building two causal models. The first model captured the dependency of the probability of response to the mailing campaign on the independent variables in the database. The second one modeled the dependency of the dollar amount of the gift and it was built by us- ing only the 5% respondents to the ’97 mailing campaign. We focused here on the first model, depicted in Figure 10.12, which shows that the probability of a donation (variable Target-B in the top-left corner) is directly affected by the wealth rating (variable Wealth1) and the donor’s neighborhood (variable Domain1). The net- work shows that, marginally, only 5% of those who received the renewal mail are likely to respond. Persons living in suburbs, cities or towns have about 5% probabil- ity of responding, while donors living in rural or urban neighborhoods respond with probability 5%. The wealth rating of the donor neighborhood has a positive effect on the response rate of donors living in urban, suburban or city areas with donors living in wealthier neighborhoods being more likely to respond than donors living in poorer neighborhoods. The probability of responding raises up to about 6% for donors living in wealth city neighborhoods. The variable Domain1 is closely related to the variable Domain2 that represents an indicator of the socio-economic status of the donor neighborhood and it shows that donors living in suburbs or city are more likely to live in neighborhoods having a highly rated socio-economic status. There- fore, they may be more sensitive to political and social issues. The model also shows that donors living in neighborhoods with a high presence of males active in the Mil- itary (Malemili) are more likely to respond. Again, since the charity collects funds for military veterans, this fact supports the hypothesis that sensitivity to the problem for which funds are collected has a large effect on the probability of response. On the other hand, the wealth rating of donors living in rural neighborhood has the op- posite effect: the higher the wealth rating, the smaller the probability that the donor responds, and the least likely to respond (3.8%) are donors living in wealth rural areas. A curiosity is that persons living in rural and poor neighborhood are more likely to respond positively to mail including a gift than donors living in wealthy city neighborhood. By querying the network, we can profile respondents who are more likely to live in a wealth neighborhood, which is located in a suburb and they are less likely to have made a donation in the last 6 months than those who do not respond. One feature that discriminates respondents from nonrespondent is the household income, and respondents are 1.20 times more likely to be living in wealthy neighborhoods, and to be on higher income than nonrespondents. 10 Bayesian Networks 203 Fig. 10.12. The Bayesian network induced from the data warehouse to profile likely respon- dents to mail solicitations. 10.7 Conclusions and Future Research Directions Bayesian networks are a representation formalism born at the intersection of statistics and Artificial Intelligence. Thanks to their solid statistical foundations, they have been successfully turned into a powerful Data Mining and knowledge discovery tool able to uncover complex models of interactions from large databases. Their high symbolic nature makes them easily understandable to human operators. Contrary to standard classification methods, Bayesian networks do not require the preliminary identification of an outcome variable of interest but they are able to draw probabilistic inferences on any variable in the database. Notwithstanding these attractive properties, there are still several theoretical is- sues that limit the range of applicability of Bayesian networks to the practice of science and engineering. This chapter has described methods to learn Bayesian net- works from databases with either discrete or continuous variables. How to induce Bayesian networks from databases containing both types of variables is still very much an open research issues. Imposing the assumption that discrete variables can only be parent nodes in the network, but cannot be children of any continuous Gaus- sian node leads to a closed form solution for the computation of the marginal likeli- hood (Lauritzen, 1992). This property has been applied, for example, to model-based clustering by (Ramoni et al., 2002), and it is commonly used in classification prob- lems (Cheeseman and Stutz, 1996). However, this restriction can quickly become unrealistic and greatly limit the set of models to explore. As a consequence, common 204 Paola Sebastiani, Maria M. Abad, and Marco F. Ramoni practice is still to discretize continuous variables with possible loss of information, particularly when the continuous variables are highly skewed. Another challenging research issue is how to learn Bayesian networks from in- complete data. The received view of the effect of missing data on statistical inference is based on the approach described by Rubin in (Rubin, 1987). This approach clas- sifies the missing data mechanism as ignorable or not, according to whether the data are missing completely at random (MCAR), missing at random (MAR), or informa- tively missing (IM). According to this approach, data are MCAR if the probability that an entry is missing is independent of both observed and unobserved values. They are MAR if this probability is at most a function of the observed values in the database and, in all other cases, data are IM. The received view is that, when data are either MCAR or MAR, the missing data mechanism is ignorable for parame- ter estimation, but it is not when data are IM. An important but overlooked issue is whether the missing data mechanism generating data that are MAR is ignorable for model selection (Rubin, 1996, Sebastiani and Ramoni, 2001A). We have shown that this is not the case for regression type graphical models exemplified and in- troduced two approaches to model selection with partially ignorable missing data mechanisms: ignorable imputation and model folding. Contrary to standard impu- tation schemes (Geiger et al., 1995, Little and Rubin, 1987, Schafer, 1997, Tanner, 1996,Thibaudeau and Winler, 2002), ignorable imputation accounts for the missing- data mechanism and produces, asymptotically, a proper imputation model as defined by Rubin (Rubin, 1987, Rubin et al., 1995). However, the computation effort can be very demanding and model folding is a deterministic method to approximate the exact marginal likelihood that reaches high accuracy at a low computational cost, because the complexity of the model search is not affected by the presence of incom- plete cases. Both ignorable imputation and model folding reconstruct a completion of the incomplete data by taking into account the variables responsible for the miss- ing data. This property is in agreement with the suggestion put forward in (Heitjan and Rubin, 1991, Little and Rubin, 1987, Rubin, 1976) that the variables responsi- ble for the missing data should be kept in the model. However, our approach allows us to also evaluate the likelihoods of models that do not depend explicitly on these variables. Although this work provides the analytical foundations for a proper treatment of missing data when the inference task is model selection, it is limited to the very special situation in which only one variable is partially observed, data are supposed to be only MCAR or MAR, and the set of Bayesian networks is limited to those in which the partially observed variable is a child of the other variables. Research is needed to extend these results to the more general graphical structures, in which several variables can be partially observed and data can be MCAR, MAR or IM. These two issues — learning mixed variables networks and handling incomplete databases — are still unsolved and they offer challenging research opportunities. 10 Bayesian Networks 205 Acknowledgments This work was supported in part by the National Science Foundation (ECS-0120309), the Spanish State Office of Education and Universities, the European Social Fund and the Fulbright Program of the US State Department. References S. G. Bottcher and C. Dethlefsen. Deal: A package for learning Bayesian networks. Available from http://www.jstatsoft.org/v08/i20/deal.pdf, 2003. U. M. Braga-Neto and E. R. Dougerthy. Is cross-validation valid for small-sample microarray classification. Bioinformatics, 20:374–380, 2004. E. Castillo, J. M. Gutierrez, and A. S. Hadi. Expert Systems and Probabilistic Network Models. Springer, New York, NY, 1997. E. Charniak. Belief networks without tears. AI Magazine, pages 50–62, 1991. P. Cheeseman and J. Stutz. Bayesian classification (AutoClass): Theory and results. In U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in Knowledge Discovery and Data Mining, pages 153–180. MIT Press, Cambridge, MA, 1996. J. Cheng and M. Druzdzel. AIS-BN: An adaptive importance sampling algorithm for evi- dential reasoning in large Bayesian networks. J Artif Intell Res, 13:155–188, 2000. D. M. Chickering. Learning equivalence classes of Bayesian-network structures. J Mach Learn Res, 2:445–498, February 2002. G. F. Cooper. The computational complexity of probabilistic inference using Bayesian belief networks. aij, 42:297–346, 1990. G. F. Cooper and E. Herskovitz. A Bayesian method for the induction of probabilistic net- works from data. Mach Learn, 9:309–347, 1992. R. G. Cowell, A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter. Probabilistic Networks and Expert Systems. Springer, New York, NY, 1999. A. P. Dawid and S. L. Lauritzen. Hyper Markov laws in the statistical analysis of decompos- able graphical models. Ann Stat, 21:1272–1317, 1993. Correction ibidem, (1995), 23, 1864. R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, NY, 1973. N. Friedman. Inferring cellular networks using probabilistic graphical models. Science, 303:799–805, 2004. N. Friedman, D. Geiger, and M. Goldszmidt. Bayesian network classifiers. Mach Learn, 29:131–163, 1997. N. Friedman and D. Koller. Being Bayesian about network structure: A Bayesian approach to structure discovery in bayesian networks. Machine Learning, 50:95–125, 2003. N. Friedman, K. Murphy, and S. Russell. Learning the structure of dynamic probabilistic networks. In Proceedings of the 14th Annual Conference on Uncertainty in Artificial In- telligence (UAI-98), pages 139–147, San Francisco, CA, 1998. Morgan Kaufmann Pub- lishers. D. Geiger and D. Heckerman. Learning gaussian networks. In Proceedings of the Tenth Annual Conference on Uncertainty in Artificial Intelligence (UAI-94), San Francisco, 1994. Morgan Kaufmann. 206 Paola Sebastiani, Maria M. Abad, and Marco F. Ramoni D. Geiger and D. Heckerman. A characterization of Dirichlet distributions through local and global independence. Ann Stat, 25:1344–1368, 1997. A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian Data Analysis. Chapman and Hall, London, UK, 1995. S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions and the Bayesian restora- tion of images. IEEE T Pattern Anal, 6:721–741, 1984. W. R. Gilks and G. O. Roberts. Strategies for improving MCMC. In W. R. Gilks, S. Richard- son, and D. J. Spiegelhalter, editors, Markov Chain Monte Carlo in Practice, pages 89– 114. Chapman and Hall, London, UK, 1996. C. Glymour, R. Scheines, P. Spirtes, and K. Kelly. Discovering Causal Structure: Artifi- cial Intelligence, Philosophy of Science, and Statistical Modeling. Academic Press, San Diego, CA, 1987. I. J. Good. Rational decisions. J Roy Stat Soc B, 14:107–114, 1952. I. J. Good. The Estimation of Probability: An Essay on Modern Bayesian Methods. MIT Press, Cambridge, MA, 1968. D. J. Hand. Construction and Assessment of Classification Rules. Wiley, New York, NY, 1997. D. J. Hand, N. M. Adams, and R. J. Bolton. Pattern Detection and Discovery. Springer, New York, 2002. D. J. Hand, H. Mannila, and P. Smyth. Principles of Data Mining. MIT Press, Cambridge, 2001. T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer- Verlag, New York, 2001. D. Heckerman. Bayesian networks for Data Mining. Data Min Knowl Disc, 1:79–119, 1997. D. Heckerman, D. Geiger, and D. M. Chickering. Learning Bayesian networks: The combi- nations of knowledge and statistical data. Mach Learn, 20:197–243, 1995. D. F. Heitjan and D. B. Rubin. Ignorability and coarse data. Ann Stat, 19:2244–2253, 1991. R. E. Kass and A. Raftery. Bayes factors. J Am Stat Assoc, 90:773–795, 1995. P. Langley, W. Iba, and K. Thompson. An analysis of Bayesian classifiers. In Proceedings of the Tenth National Conference on Artificial Intelligence, pages 223–228, Menlo Park, CA, 1992. AAAI Press. P. Larranaga, C. Kuijpers, R. Murga, and Y. Yurramendi. Learning Bayesian network struc- tures by searching for the best ordering with genetic algorithms. IEEE T Pattern Anal, 26:487–493, 1996. S. L. Lauritzen. Propagation of probabilities, means and variances in mixed graphical asso- ciation models. J Am Stat Assoc, 87(420):1098–108, 1992. S. L. Lauritzen. Graphical Models. Oxford University Press, Oxford, UK, 1996. S. L. Lauritzen and D. J. Spiegelhalter. Local computations with probabilities on graphical structures and their application to expert systems (with discussion). J Roy Stat Soc B, 50:157–224, 1988. R. J. A. Little and D. B. Rubin. Statistical Analysis with Missing Data. Wiley, New York, NY, 1987. D. Madigan and A. E. Raftery. Model selection and accounting for model uncertainty in graphical models using Occam’s window. J Am Stat Assoc, 89:1535–1546, 1994. D. Madigan and G. Ridgeway. Bayesian data analysis for Data Mining. In Handbook of Data Mining, pages 103–132. MIT Press, 2003. D. Madigan and J. York. Bayesian graphical models for discrete data. Int Stat Rev, pages 215–232, 1995. 10 Bayesian Networks 207 P. McCullagh and J. A. Nelder. Generalized Linear Models. Chapman and Hall, London, 2nd edition, 1989. A. O’Hagan. Bayesian Inference. Kendall’s Advanced Theory of Statistics. Arnold, London, UK, 1994. J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of plausible inference. Morgan Kaufmann, San Francisco, CA, 1988. M. Ramoni, A. Riva, M. Stefanelli, and V. Patel. An ignorant belief network to forecast glucose concentration from clinical databases. Artif Intell Med, 7:541–559, 1995. M. Ramoni and P. Sebastiani. Bayesian methods. In Intelligent Data Analysis. An Introduc- tion, pages 131–168. Springer, New York, NY, 2nd edition, 2003. M. Ramoni, P. Sebastiani, and I.S. Kohane. Cluster analysis of gene expression dynamics. Proc Natl Acad Sci USA, 99(14):9121–6, 2002. L. Rokach, M. Averbuch, and O. Maimon, Information retrieval system for medical narra- tive reports. Lecture notes in artificial intelligence, 3055. pp. 217-228, Springer-Verlag (2004). D. B. Rubin. Inference and missing data. Biometrika, 63:581–592, 1976. D. B. Rubin. Multiple Imputation for Nonresponse in Survey. Wiley, New York, NY, 1987. D. B. Rubin. Multiple imputation after 18 years. J Am Stat Assoc, 91:473–489, 1996. D. B. Rubin, H. S. Stern, and V. Vehovar. Handling “don’t know” survey responses: the case of the Slovenian plebiscite. J Am Stat Assoc, 90:822–828, 1995. M. Sahami. Learning limited dependence Bayesian classifiers. In Proceeding of the 2 Int. Conf. On Knowledge Discovery & Data Mining, 1996. J. L. Schafer. Analysis of Incomplete Multivariate Data. Chapman and Hall, London, UK, 1997. P Sebastiani, M Abad, and M F Ramoni. Bayesian networks for genomic analysis. In E R Dougherty, I Shmulevich, J Chen, and Z J Wang, editors, Genomic Signal Processing and Statistics, Series on Signal Processing and Communications. EURASIP, 2004. P. Sebastiani and M. Ramoni. Analysis of survey data with Bayesian networks. Technical Report, Knowledge Media Institute, The Open University, Walton Hall, Milton Keynes MK7 6AA, 2000. Available from authors. P. Sebastiani and M. Ramoni. Bayesian selection of decomposable models with incomplete data. J Am Stat Assoc, 96(456):1375–1386, 2001A. P. Sebastiani and M. Ramoni. Common trends in european school populations. Res. Offic. Statist., 4(1):169–183, 2001B. P. Sebastiani and M. F. Ramoni. On the use of Bayesian networks to analyze survey data. Res. Offic. Statist., 4:54–64, 2001C. P. Sebastiani and M. Ramoni. Generalized gamma networks. Technical report, University of Massachusetts, Department of Mathematics and Statistics, 2003. P. Sebastiani, M. Ramoni, and A. Crea. Profiling customers from in-house data. ACM SIGKDD Explorations, 1:91–96, 2000. P. Sebastiani, M. Ramoni, and I. Kohane. BADGE: Technical notes. Technical report, De- partment of Mathematics and Statistics, University of Massachusetts at Amherst, 2003. P. Sebastiani, M. F. Ramoni, V. Nolan, C. Baldwin, and M. H. Steinberg. Discovery of com- plex traits associated with overt stroke in patients with sickle cell anemia by Bayesian network modeling. In 27th Annual Meeting of the National Sickle Cell Disease Program, 2004. To appear. 208 Paola Sebastiani, Maria M. Abad, and Marco F. Ramoni P. Sebastiani, Y. H. Yu, and M. F. Ramoni. Bayesian machine learning and its potential applications to the genomic study of oral oncology. Adv Dent Res, 17:104–108, 2003. R. D. Shachter. Evaluating influence diagrams. Operation Research, 34:871–882, 1986. M. Singh and M. Valtorta. Construction of Bayesian network structures from data: A brief survey and an efficient algorithm. Int J Approx Reason, 12:111–131, 1995. D. J. Spiegelhalter and S. L. Lauritzen. Sequential updating of conditional probabilities on directed graphical structures. Networks, 20:157–224, 1990. P. Spirtes, C. Glymour, and R. Scheines. Causation, prediction and search. Springer, New York, 1993. M. A. Tanner. Tools for Statistical Inference. Springer, New York, NY, third edition, 1996. Y. Thibaudeau and W. E. Winler. Bayesian networks representations, generalized imputation, and synthetic microdata satisfying analytic restraints. Technical report, Statistical Re- search Division report RR 2002/09, 2002. http://www.census.gov/srd/www/byyear.html. A. Thomas, D. J. Spiegelhalter, and W. R. Gilks. Bugs: A program to perform Bayesian inference using Gibbs Sampling. In J. Bernardo, J. Berger, A. P. Dawid, and A. F. M. Smith, editors, Bayesian Statistics 4, pages 837–42. Oxford University Press, Oxford, UK, 1992. J. Whittaker. Graphical Models in Applied Multivariate Statistics. Wiley, New York, NY, 1990. S. Wright. The theory of path coefficients: a reply to niles’ criticism. Genetics, 8:239–255, 1923. S. Wright. The method of path coefficients. Annals of Mathematical Statistics, 5:161–215, 1934. J. Yu, V. Smith, P. Wang, A. Hartemink, and E. Jarvis. Using Bayesian network inference al- gorithms to recover molecular genetic regulatory networks. In International Conference on Systems Biology 2002 (ICSB02), 2002. H. Zhou and S. Sakane. Sensor planning for mobile robot localization using Bayesian net- work inference. J. of Advanced Robotics, 16, 2002. To appear. 11 Data Mining within a Regression Framework Richard A. Berk Department of Statistics UCLA berk@stat.ucla.edu Summary. Regression analysis can imply a far wider range of statistical procedures than often appreciated. In this chapter, a number of common Data Mining procedures are discussed within a regression framework. These include non-parametric smoothers, classification and regression trees, bagging, and random forests. In each case, the goal is to characterize one or more of the distributional features of a response conditional on a set of predictors. Key words: regression, smoothers, splines, CART, bagging, random forests 11.1 Introduction Regression analysis can imply a broader range of techniques than ordinarily appre- ciated. Statisticians commonly define regression so that the goal is to understand “as far as possible with the available data how the the conditional distribution of some response y varies across subpopulations determined by the possible values of the predictor or predictors” (Cook and Weisberg, 1999). For example, if there is a single categorical predictor such as male or female, a legitimate regression analysis has been undertaken if one compares two income histograms, one for men and one for women. Or, one might compare summary statistics from the two income distribu- tions: the mean incomes, the median incomes, the two standard deviations of income, and so on. One might also compare the shapes of the two distributions with a Q-Q plot. There is no requirement in regression analysis for there to be a “model” by which the data were supposed to be generated. There is no need to address cause and ef- fect. And there is no need to undertake statistical tests or construct confidence inter- vals. The definition of a regression analysis can be met by pure description alone. Construction of a “model,” often coupled with causal and statistical inference, are supplements to a regression analysis, not a necessary component (Berk, 2003). Given such a definition of regression analysis, a wide variety of techniques and approaches can be applied. In this chapter, I will consider a range of procedures O. Maimon, L. Rokach (eds.), Data Mining and Knowledge Discovery Handbook, 2nd ed., DOI 10.1007/978-0-387-09823-4_11, © Springer Science+Business Media, LLC 2010 . J. Hand, N. M. Adams, and R. J. Bolton. Pattern Detection and Discovery. Springer, New York, 20 02. D. J. Hand, H. Mannila, and P. Smyth. Principles of Data Mining. MIT Press, Cambridge, 20 01. T Madigan and G. Ridgeway. Bayesian data analysis for Data Mining. In Handbook of Data Mining, pages 103–1 32. MIT Press, 20 03. D. Madigan and J. York. Bayesian graphical models for discrete data. . procedures O. Maimon, L. Rokach (eds.), Data Mining and Knowledge Discovery Handbook, 2nd ed., DOI 10.1007/978-0-387-09 823 -4_11, © Springer Science+Business Media, LLC 20 10