Visualizing the Loss of Diversity in Genetic Programming Jason M. Daida, David J. Ward, Adam M. Hilss, Stephen L. Long, Mark R. Hodges, and Jason T. Kriesel Center for the Study of Complex Systems and Space Physics Research Laboratory 2455 Hayward Avenue Ann Arbor, Michigan 48109-2143 daida@eecs.umich.edu Abstract- This paper introduces visualization techniques that allow for a multivariate approach in understanding the dynamics that underlie genetic programming (GP). Emphasis is given toward understanding the relationship between problem difficulty and the loss of diversity. The visualizations raise questions about diversity and problem solving efficacy, as well as the role of the initial population in determining solution outcomes. I. INTRODUCTION There has been a significant amount of evolutionary computation on the matter of diversity. Part of this interest extends well into some of the earliest investigations (e.g., [1-4]). Work such as these has helped to shape our current understanding of the need for diversity in genetic and evolutionary computation: namely, that diversity helps to escape premature convergence. It suggests that increasing diversity could lead to more robust problem solving. However, there are a few instances that argue for a nuanced understanding of the role of diversity in problem solving with GP. These include the following: • Burke et al. [5] systematically examined various measures of diversity, some of which have been devised in support of a particular diversity-promoting technique. They concluded that the relationship between diversity and run performance is not straightforward. More robust problem solving is not necessarily a result of more diversity, depending which definition is used. • Luke et al. [6] demonstrated that dynamically decreasing the size of a population over the course of a run might be a computationally effective method. Their idea and subsequent method run counter to the assumption that a large, presumably diverse population needs to be maintained at all times. It could be argued that Luke et al.’s work shows that less diversity could lead to more robust diversity. Unfortunately, a nuanced understanding of diversity has been difficult to achieve. This difficulty exists partly because such studies require large amounts of data for examination. Typical diversity studies require capturing data on each individual that is created over the course of a GP run. Even modest studies that involve relatively small populations of 500 individuals can generate on the order of one terabyte of data, if only because multiple trials need to be run. This difficulty exists also because the dynamics associated with GP also tends to be multivariate and not well understood. There are few standard methods for conducting multivariate analyses over large data sets in which some of the key variables and interactions are unknown. In situations like these, it is subsequently common to employ data exploration methods—quantitative multivariate visualization techniques—to glean possible linkages between possible key variables and interactions. Unfortunately, such visualization techniques are non- standard for more than a handful of variables and have not been available for GP. This paper builds upon the methods and results of two key papers [5, 7] that have been published on the subject of diversity. We do so by incorporating newly developed techniques for data visualizing in GP. Some of these techniques have been borrowed from other work (i.e., [8]) for visualizing populations of trees. Other techniques have been developed specifically by us and are reported here. In using these visualization techniques, we corroborate what these two papers have found and reveal for the first time some of the temporal dynamics that underlie population histories in GP. The organization of this paper is as follows. Section 2 details our motivation and our subsequent definitions. Section 3 summarizes the procedure that was used to generate our data. Section 4 addresses methods and results for visualizing data from single runs of GP, while Section 5 addresses methods and results for multiple runs. Section 6 discusses implications of the data visualized. Section 7 concludes this paper. II. BACKGROUND A. Motivation This paper builds from the findings and suggestions of two studies [5, 7] that focused on understanding diversity in the context of population dynamics in GP. Our motivations are described here. Both works indirectly suggest the possibility for a multivariate approach to understanding diversity. McPhee and Hopper’s work suggested that population dynamics would be of interest, but their analysis featured one-variable statistics and did not figure in time as a variable. Burke et al.’s work was almost exclusively based on two-variable statistics, but they concluded that such an approach did not 0-7803-8515-2/04/$20.00 ©2004 IEEE 1225 allow for them to understand the dynamics of a population. Consequently, this paper describes quantitative multivariate visualization techniques to identify possible linkages between multiple variables. It would serve as a precursor to a multivariate statistical analysis. McPhee and Hopper described a rigorous method, but replication and extension of their method has been difficult. In comparison to the other measures of diversity reported in Burke et al.’s work, McPhee and Hopper’s method is arguably the most rigorous. One could derive other measures of diversity from a data set generated using McPhee and Hopper’s method, which is not possible to do so the other way around. Unfortunately, the analysis conducted by McPhee and Hopper was based on 20-trial statistics—it was perhaps unwieldy at the time to do more. It did not help that their experiments used non-standard GP settings (i.e., steady-state and the use of a size normalization threshold). Consequently, this paper used a form of McPhee and Hopper’s method on a substantially larger study that allows for 200-trial statistics. Burke et al. partially replicated McPhee and Hopper’s work, but not the part that tracked the history of initial population individuals. The omission is noteable, because they were looking for early indicators for run performance and because McPhee and Hopper indicated that only a few initial population individuals contribute to a final population. The omission is also understandable, because the amount of work required to do the bookkeeping of nodes to initial population individuals is nontrivial. Nevertheless, we focused on the initial population individuals because of their potential importance to understanding GP dynamics. B. Definitions There are several terms that are used throughout this paper. They are all nonstandard, given that much of the previous literature has been influenced by a Markovian type of analysis (i.e., focus on the transition from time t i to t i+1 ). In contrast, our analysis has been motivated with the intent of using the initial population as a predictor of solution outcomes. Our terms and their associated definitions are subsequently reflective of this particular intent. These definitions presume that standard GP is being discussed, although extensions to other methods in evolutionary computation are also possible. • V 0 . Given an initial population P 0 , let every node / vertex in this population be uniquely identified and labeled to form a set V 0 . For example, an initial population P 0 of 500 individuals could consists of well over 14,000 nodes, depending on the type of population initalization scheme that was used. The membership of V 0 presumes that each of these 14,000+ nodes is considered as uniquely labeled. • Ancestor. An ancestor A i specifically refers to an individual from the initial population P 0 . Since all nodes in the initial population are unique, it follows that an ancestor A i consists of a unique set of nodes A i ⊇ V 0 that is mutually exclusive from the set of nodes A j , which characterizes an ancestor A j , where i ≠ j. For example, a three-node tree in P 0 is presumed to be uniquely identified by a set A = {a 1 , a 2 , a 3 }, given that A ⊇ V 0 and the elements a 1 , a 2 , and a 3 are not a part of any other tree in P 0 ,. • Lineage. Lineage L B for an individual B at some generation t i , i ≠ 0, specifically refers to the set of ancestors A that contribute to that individual. Consequently, L B ≡∀A ∈ P 0 : ∃a ∈ A ← A and a ∈ B ← B {} .(1) For example, an individual B can have a lineage that consists of {A 1 , A 2 , A 3 } if there is some node from each of these ancestors that exists in B. • Frequency of Ancestor Occurrence. Frequency of ancestor occurrence n A specifically refers to the number of individuals in a given population that have ancestor A as part of their lineage. For example, in the initial population P 0 , each individual A i has a frequency of ancestor occurance n A = 1. • Surviving Number of Ancestors. Surviving number of ancestors n s for a given population refers to the number of ancestors that contribute at least one node to that population. For example, in the initial population P 0 that is of population size M, n s = M. III. EXPERIMENTAL PROCEDURE We analyzed GP on a particular, well documented, tunably difficult test problem (i.e., binomial-3). The problem been designed as a probe for understanding GP dynamics and represents an instance of data modeling. The binomial-3 is discussed in detail in [9, 10]. In brief, the problem is an instance taken from symbolic regression and involves solving for the function f(x) = 1 + 3x + 3x 2 + x 3 . Fitness cases are 50 equidistant points generated from f(x) over the interval [-1, 0). The function set is {+, –, ×, ÷}, which corresponds to arithmetic operators of addition, subtraction, multiplication, and protected division. A terminal set was { X, R}, where X is the symbolic variable and R is the set of ephemeral random constants that are distributed uniformly over the interval [- α , α ]. The tuning parameter is α , which is a real number. The binomial-3 can be tuned from a relatively easy problem (e.g., α = 1) to a difficult one (e.g., α = 1000). We used a modified version of lilgp [11] similar to that used in [9]. Most of the modifications were for bug fixes and for the replacement of the random number generator with the Mersenne Twister [12]. Other significant modifications included augmenting the data structure associated with each node to include an integer ID that serves as that node’s serial number. Each ID is unique to a node and is generated once during population initialization. We configured lilgp to run as a single thread. The ID labeling scheme was a result of [9], but is similar to that desccribed in [7]. McPhee and Hopper’s scheme called for tagging each node in the initial population with 1226 integer label pairs (ID:memID). The ID part of their label is assigned just once at population initialization and consists of an integer that is unique to a node relative to the set of nodes that make up the initial population (i.e., V 0 ). ID is used as a serial number that can be used to track individual nodes. memID is used for providing an audit trail for subtree memberships. For our work, we implemented what amounts to just the ID portion of their integer pair. Table 1 lists the parameter settings considered in this paper. Most of the GP parameters were similar to those mentioned in [13], Chapter 7. TABLE 1. PARAMETER SETTINGS Parameter Setting Selection Tournament q=7 or Proportionate Population Size M 500 Initialization Method Ramped Half-and-Half Initialization Depths 2–6 Levels Max Generations G 200 Maximum Depth 26 Internal Node Bias 90% internal, 10% terminals Termination Criteria Run reaches G Binomial-3 α 1 or 1000 Number of Trials 200 We used four different experimental configurations, given that we considered two different selection methods and two different difficulty settings. There were 200 trials taken per configuration for a total of 800 trials. Although the number of trials is modest, what was unusual were the data requirements specified for each trial. Each trial resulted in recording each individual in every population that was generated during a trial for a total of 201 generations (i.e., 200 specified generations and the initial population). The total amount of data that was generated by these four configurations was about 0.5 TB. Post-processing of this population data was done in two stages: the first stage reduced individuals to their appropriate lineages and structures, while the second stage visualized the reduced data. Both stages were custom-coded: the first stage was coded in PERL; the second stage, in Mathematica. There were 80,400,000 trees that were parsed and analyzed in this manner (i.e., 4 configurations, 200 trials per configuration, 201 generations per trial, 500 individual trees per generation). Post-processing time was about one CPU-month per configuration. GP trials and data reduction were run on a Linux workstation. Visualization was done on a Power Mac. IV. SINGLE-TRIAL METHODS AND RESULTS The visualization principles employed in this work were those described by Tufte in [14-16]. Tufte’s work has been influential in the design of data graphics. He is known for a particular, minimalist style of visualization that we use in the construction of nonstandard graphics. This particular style is not commonly used in the evolutionary computation community. For that reason, it is worthwhile quoting several of his design principles from [14] (pp. 105, 121): • Above all else show the data • Maximize the data-ink ratio • Erase non-data-ink • Erase redundant data-ink We have used these principles to generate visualizations of single- and multiple-trial dynamics. In comparison to the one- and two-variable techniques in [5, 7], our technique is a ten-variable visualization: population structure (three variables), lineage mapping (i.e., ancestors to individuals in a current population), frequency of ancestor occurrence n A , rank ordering of individuals in the current generation, rank ordering of ancestors, surviving number of ancestors n s , time, and problem difficulty. Of these ten variables, visualizing the three variables associated with population structure is based on a relatively new technique that was introduced in [8]. The technique calls for each tree to be layed out on a circular grid, as shown in Figures 1 and 2. 1 12 13 14 15 3 6 7 89 10 11 2 4 5 1 7 3 2 6 5 4 11 10 9 8 15 14 13 12 (a) (b) Figure 1. Mapping a full binary tree to a circular gird. (a) Full binary tree of depth 3. (b) Corresponding circular grid. 1 3 89 10 11 2 4 5 1 3 2 5 4 11 10 9 8 (a) (b) Figure 2. Mapping a partial binary tree to a circular grid. (a) Binary tree of depth 3. (b) Corresponding mapping of this tree on a circular grid. Cumulative distributions for a population were then computed for each grid point. This step is analogous to overlaying tree structures one on top of the other. As a result, darker lines correspond to links that are more frequently used in a population. The variables that this visualization would encode for are structure (two dimensions) and cumulative distribution. One typically uses this method to see which structures are more frequently used in a population. The method is useful inasmuch as structure has been an integral part of GP theory (e.g., [17-19]) Figure 3a shows an example of visualization of a population of 500. 1227 A visualization that shows every occupied grid point for a population can detract away from understanding which structures are most typically used. This happens because visible lines can be made only so thin and unintentional occlusion takes place. Consequently, we displayed only those structures that are common to a majority of a population. Figure 3b shows the same example as given in Figure 3a, except that only the structures that are in the majority is shown. (a) (b) Figure 3. Visualization of Population Structure. The data is from generation 200 for the configuration α = 1000 and tournament selection. (a) Visualization of a population of 500 trees. Darker lines indicate greater frequencies of those structures that appear in that population. The gray circle is for reference only and corresponds to depth level 26. (b) Majority-only view. This view shows the same data as in (a), but only for those structures in which 50% or more of population uses are shown. Visualizing the next five variables was accomplished by using a modified form of a histogram. The five variables that were considered in this part of the visualization included the following: lineage mapping (i.e., ancestors to individuals in a current population), n A , rank ordering of individuals in the current generation, rank ordering of ancestors, and n s . These variables address the issue of content use at the level of individual and population. We note that structure is implicit at these levels of examination, since individuals are identified by their nodes and nodes are organized by their associated tree structure. Figure 4 gives an example of the modified histogram. This modified histogram consists of two parts: the upper part, which shows the n A ; and the lower part, which shows the associated lineage mapping from ancestors to currrent generation individuals. Concerning the upper part, the x-axis of the frequency-of-ancestor-occurrence histogram is rank ordered. This means that ancestors are positionally identified from the least fit (left) to the most fit (right). The ordering of ancestors is determined just once at generation 0 and does not change during subsequent generations. The y- axis of the histogram is normalized according to population size M, since that is the maximum value that is attainable by any one ancestor. The frequency data that is displayed is specific to the given current generation. An absence of a bar in the histogram part of this visualization means that an ancestor has not survived. This absense is indicative of the complement of the variable that describes the surviving number of ancestors n s . The lower part of this modified histogram shows the associated lineage mapping from ancestors to the current generation. As in the upper part, the ancestors are positionally identified from the least fit (left) to the most fit (right). The current-generation individuals are ordered likewise. Figure 4 shows an instance where an ancestor at rank position a has a lineage that maps to multiple individuals in the current generation. Arrows at b and c identify a few of those mappings The lines that link an ancestor to a current-generation individual are darker as the rank of the current-generation individual increases. Both frequency-of-ancestor-occurrence histogram and lineage mapping are correlated. For example, the ancestor at position a in Figure 4 shows a frequency of occurance that is low. Likewise, the number of mappings from that ancestor to the current-generation individuals is sparse. Likewise, if the frequency of ancestor occurrence were at maximum (i.e., M ), it would mean that parts of that ancestor are distributed throughout the entire population. Figure 5 shows the complete ten-variable visualization for an “easy” problem (which featured tournament selection and α = 1). To show the first eight variables, the visualizations for population structure and the modified histogram for content were shown in tandem for a given generation. To show the variable of time, we used the technique of small multiples (see [14, 15]). The number that is embedded in the frequency-of-ancestor-occurence part of the modified histogram corresponds to the current generation. Consequently, Figure 5 shows the dynamics of population structure and content for the first 22 generations, which were Lineage Mapping Frequency of Ancestor Occurrence (Histogram) Ancestors A Current Generation Individuals B Rank a bc Figure 4. Visualization of Content at the Level of Individual and Population. There are two main parts to this visualization: frequency of ancestor occurrence histogram and lineage mapping. The visualization that is shown is from generation 3 of α = 1, fitness proportionate selection. 1228 sampled for visualization every two generations. To show the variable of problem difficulty, we used a simple thermometer icon at the bottom of the figure. The particular metric that is displayed is the percentage of successful trials that produced a “perfect” solution. Consequently, the easier a problem is to solve, the greater the number of trials out of the total number of trials that end up producing a “perfect” solution. The icon that is shown in Figure 5 corresponds to a relatively easy problem: about 7 out of every 10 trials produced a “perfect” solution. We note that the data that was used to produce the thermometer icon was based off of results from [10]. This was done in part because the metrics associated with difficult problems correspond to low probabilities (i.e., < 1%). The results given in were based on 600 trials, instead of the 200 trials given here. However, the statistics concerning this metric between data sets were comparable. Figure 6 shows contrasting results for a “difficult” problem. 0 2 4 6 8 10 12 14 16 18 20 22 70% Figure 5. Ten-Variable Visualization of Structure and Content for an “Easy” Problem (i.e., tournament selection and α = 1). The data that is shown is fo r the first 22 generations of a successful trial. 1229 V. MULTI-TRIAL METHODS AND RESULTS There is a trade-off in visualizing single versus multiple trials. Although single-trial results provide insight on nuances in GP dynamics that occur during the course of a run, it is difficult to place those nuances in an appropriate statistical context. Unfortunately, the ten-variable visualization of Section IV does not scale easily to account for multiple trials. For that reason, we reduced the number of variables that can be examined at once. In doing so, we chose to emphasize just a handful of variables so that more of the data can be shown. We used visualizations that were generated using methods from Section IV to identify possible variables for further study. We settled on a five-variable visualization: surviving number of ancestors n s , time, problem difficulty, selection method, and cumulative distribution. The visualization is constructed from common methods in data visualization, 0 2 4 6 8 10 12 14 16 18 20 22 1% Figure 6. Ten-Variable Visualization of Structure and Content for a “Very Difficult” Problem (i.e., fitness proportionate selection and α = 1000). The data that is shown is for the first 22 generations of a successful trial. 1230 whereby the each of the four major elements of this visualization correspond to a density plot. The x-axis is time (in generations) and the y-axis is the surviving number of ancestors. The maximum surviving number of ancestors is M, which occurs at generation 0. Oftentimes in evolutionary computation, the plot most commonly employed to display quantities that vary as a function of time is a line graph. In this case, the cumulative distributions are inferred by the number of lines that occur per square unit in a plot. Instead of resorting to this technique, however, cumulative distributions were computed for every nonzero tuple of surviving number of ancestors and time. A density plot was then used to display this data instead of line graphs. The convention used in this paper has darker tones indicating higher cumulative distributions. Consequently, each density plot describes three variables (i.e., surviving number of ancestors n s , time, cumulative distribution). The remaining two variables—selection method and problem difficulty—were displayed using the method of small multiples. Each density plot corresponds to a variation of one of these variables. Density plots are subsequently arranged as a two-dimensional matrix. However, because problem difficulty resulted in values that were not binary quantities, we used the thermometer icon (as was described in Section IV). Figure 7 shows the results of the data described in Section III. This visualization is a complete (non-sampled) summary of approximately 80 million trees that were parsed into lineages of 400,000 ancestors. It represents a view of approximately 0.5 TB of data. (Some data is not visualized since only the range [0, 150] for the surviving number of ancestors is shown. The complete range is [0, M], where M = 500.) VI. DISCUSSION Our results do corroborate many of the findings and speculations in [5, 7]. For example, the results bear out the speculations in [5], which suggested a nuanced approach in understanding diversity. Not only do the results show signficant temporal behaviors as was shown in [5], but also show that other previously neglected factors—like problem difficulty and selection method—can signficantly influence GP dynamics (i.e., Figure 7). Furthermore, the results also corroborate some of the earliest finding concerning diversity in GP [7], even though that work was based on a limited statistical sample and used nonstandard GP settings. Even though the argument in [7] for an “Eve” is perhaps overstated, the results in Figure 7 do show that for certain configurations, the amount of individuals that contribute to a population is but a fraction of the original population. Figures 5 and 6 also corroborate what is suggested in [7] 0 50 100 150 200 0 20 40 60 80 100 120 140 0 50 100 150 200 0 20 40 60 80 100 120 140 0 50 100 150 200 0 20 40 60 80 100 120 140 0 50 100 150 200 0 20 40 60 80 100 120 140 Time (Generations) Time (Generations) Surviving Number of Ancestors Surviving Number of Ancestors Time (Generations) Time (Generations) Surviving Number of Ancestors Surviving Number of Ancestors 70% 34% 37% 1% (a) (b) (c) (d) Figure 7. Five-Variable Visualization of the Loss in Diversity by Problem Difficulty and Selection Method. Each plot corresponds to a particular selection method and tuning parameter setting. (a) Fitness proportionate, α = 1. (b) Tournament, α = 1. (c) Fitness proportionate, α = 1000. (d) Tournament, α = 1000. 1231 concerning structure—i.e., that there is a convergence of both structure and content. The visualizations shown in Figures 5, 6, and 7 raise signficant issues about the dynamics that occur during the course of a GP run. Although it is beyond the scope of this paper to describe these issues at length, we summarize two of them here. • What is the relationship between diversity and problem solving ability? Fitness proportionate selection was one of the earliest methods that were used to enhance diversity for the purpose of robust problem solving [2, 3]. On one hand, the results in Figure 7 do support the contention that fitness proportionate selection does enhance diversity, since there is much more of V 0 that was retained. However, the results indicated that simply enhancing diversity during the course of a GP run was not intrinsically helpful. In both cases, “hard” selection resulted in significantly better performances (i.e., problems became significantly easier to solve under tournament selection). This occurred in spite of large losses in diversity. This finding supports previous, albeit controversial work in [20, 21]. • What are the dynamics between population size and diversity? On one hand, the theory that describes the dynamics under fitness proportionate selection is complex, which makes it difficult to make comparisons of this work with existing theory. On the other hand, there have been some investigations in tournament selection (e.g., [22, 23]) that point to dynamics that are independent of fitness distributions for a given population. Given an analysis of ancestors and lineage, their work would only apply to predicting the number of ancestors in generation 1. However, Figures 5 and 6 indicate that mixing between ancestors eventually becomes so thorough that many of the individuals in a current population can trace its lineage to most of the surviving ancestors. For that reason, it could be that the loss of diversity measures in [22, 23] might also apply to the steady state that occurs in Figures 7b and 7d. The existence of this steady state under tournament selection would also indicate why a population implosion method, as described in [6] would work. In particular, if only a fraction of ancestors are used to form a solution, then it might make sense to dynamically decrease the population size over the course of a GP run. VII. CONCLUSIONS This paper introduced several visualization techniques that enabled a multivariate exploration of GP dynamics and diversity. The resulting visualizations produced detailed patterns of temporal behaviors that occurred in what amounted to one of of the largest studies of its kind to date. This study required the tracking the lineage of 80,000,000 to some 400,000 individuals in initial populations. The visualizations summarize approximately 0.5 TB of data. The results corroborate previous theoretical and empirical studies that have been published on the subject of diversity in GP. The visualizations also raise further questions on the role of diversity in GP. REFERENCES [1] D. B. Fogel, Evolutionary Computation: The Fossil Record, Piscataway: IEEE Press, 1998, pp. 641. [2] M. Pincus, “An Evolutionary Strategy,” Journal of Theoretical Biology, vol. 28, pp. 483–488, 1970. [3] R. Galar, “Handicapped Individua in Evolutionary Processes,” Biological Cybernetics, vol. 53, pp. 1–9, 1985. [4] J. H. Holland, Adaptation in Natural and Artificial Systems. Ann Arbor: University of Michigan Press, 1975. [5] E. Burke, et al., “A Survey and Analysis of Diversity Measures in Genetic Programming,” in GECCO-2002, W. B. Langdon, et al., Eds. San Francisco: Morgan Kaufmann Publishers, 2002, pp. 716–723. [6] S. Luke, et al., “Population Implosion in Genetic Programming,” in GECCO 2003, E. Cantú-Paz, et al., Eds. Berlin: Springer-Verlag, 2003, pp. 1729–1739. [7] N. F. McPhee and N. J. Hopper, “Analysis of Genetic Diversity through Population History,” in GECCO ’99, W. Banzhaf, et al., Eds. San Francisco: Morgan Kaufmann Publishers, 1999, pp. 1112 – 1120. [8] J. M. Daida, et al., “Visualizing Tree Structures in Genetic Programming,” in GECCO ’03, E. Cantú-Paz, et al., Eds. Berlin: Springer-Verlag, 2003, pp. 1652–1664. [9] J. M. Daida, et al., “Analysis of Single-Node (Building) Blocks in Genetic Programming,” in Advances in Genetic Programming 3, L. Spector, et al., Eds. Cambridge: The MIT Press, 1999, pp. 217–241. [10] J. M. Daida, et al., “What Makes a Problem GP-Hard? Analysis of a Tunably Difficult Problem in Genetic Programming,” Genetic Programming and Evolvable Machines, vol. 2, pp. 165–191, 2001. [11] D. Zongker and W. Punch, “lilgp,” v. 1.02 ed. Lansing: Michigan State University Genetic Algorithms Research and Applications Group, 1995. [12] M. Matsumoto and T. Nishimura, “Mersenne Twister: A 623- Dimensionally Equidistributed Uniform Pseudorandom Number Generator,” ACM Transactions on Modeling and Computer Simulation, vol. 8, pp. 3–30, 1998. [13] J. R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection. Cambridge: The MIT Press, 1992. [14] E. R. Tufte, The Visual Display of Quantitative Information. Cheshire, CT: Graphics Press, 1983. [15] E. R. Tufte, Envisioning Information. Cheshire, CT: Graphics Press, 1990. [16] E. R. Tufte, Visual Explanations: Images and Quantities, Evidence and Narrative. Cheshire, CT: Graphics Press, 1997. [17] J. P. Rosca and D. H. Ballard, “Rooted-Tree Schemata in Genetic Programming,” in Advances in Genetic Programming 3, L. Spector, et al., Eds. Cambridge: The MIT Press, 1999, pp. 243–271. [18] W. B. Langdon and R. Poli, Foundations of Genetic Programming. Berlin: Springer-Verlag, 2002. [19] J. M. Daida and A. Hilss, “Identifying Structural Mechanisms in Standard Genetic Programming,” in GECCO ’03, E. Cantú-Paz, et al., Eds. Berlin: Springer-Verlag, 2003, pp. 1639–1651. [20] J J. Kim and B Y. Zhang, “Effects of Selection Schemes in Genetic Programming for Time Series Prediction,” in CEC ’99, vol. 1. Piscataway: IEEE Press, 1999, pp. 252–258. [21] B Y. Zhang and J J. Kim, “Comparison of Selection Methods for Evolutionary Computation,” Evolutionary Optimization, vol. 2, pp. 55–70, 2000. [22] T. Bickle and L. Thiele, “A Mathematical Analysis of Tournament Selection,” in ICGA ’95, L. J. Eshelman, Ed. San Francisco: Morgan Kaufmann Publishers, 1995, pp. 9–16. [23] T. Motoki, “Calculating the Expected Loss of Diversity of Selection Schemes,” Evolutionary Computation, vol. 10, pp. 397–422, 2002. 1232