Ordinal Optimization and Simulation

Ordinal Optimization and Simulation Yu-Chi Ho Harvard University, Cambridge, MA 02138 Christos G Cassandras Boston University, Boston, MA 02215 Chun-Hung Chen University of Pennsylvania, Philadelphia, PA 19104-6315 Liyi Dai Washington University, St Louis, MO 63130 Abstract Simulation plays a vital role in analyzing stochastic systems, particularly, in comparing alternative system designs with a view to optimize system performance Using simulation to analyze complex systems, however, can be both prohibitively expensive and time consuming Efficiency is a key concern for the use of simulation to optimization problems Ordinal optimization has emerged as an effective approach to significantly improve efficiency of simulation and optimization One can achieve an exponential convergence rate when using ordinal optimization for simulation problems There are already several success stories of ordinal optimization In this paper we will introduce the idea of ordinal optimization and report some recent advances in this research Details of an extension of ordinal optimization to a class of resource application problems are given as well *** Keywords: Discrete-event simulation, ordinal optimization, discrete-event dynamic systems, resource allocation problem *** Corresponding author: Professor Chun-Hung Chen, Dept of Systems Engineering, University of Pennsylvania, Philadelphia, PA 19104-6315, USA Fax: 215-573-2065, Tel: 215-898-3967, Email: chchen@seas.upenn.edu 1 Introduction It can be argued that OPTIMIZATION in the general sense of making things BETTER is the principal driving force behind any prescriptive scientific and engineering endeavor, be it operations research, control theory, or engineering design It is also true that the real world is full of complex decision and optimization problems that we cannot solve While the literature on optimization and decision making is huge, much of the concrete analytical results and success stories are associated with what may be called Real-Variable-Based methods The idea of successive approximation to an optimum (say, minimum) by sequential improvements based on local information is often captured by the metaphor "skiing downhill in a fog" The concepts of gradient (slope), curvature (valley), and trajectories of steepest descent (fall line) all require the notion of derivatives/gradients and are based on the existence of a more or less smooth response surface There exist various first and second order algorithms of feasible directions for the iterative determination of the optimum (minimum) of an arbitrary multi-dimensional response or performance surface A considerable number of major success stories exist in this genre including the Nobel prize winning work on linear programming It is not necessary to repeat or even reference these here On the other hand, we submit that the reason many real-world optimization problems remain unsolved is partly due to the changing nature of the problem domain in recent years which makes calculus or real-variable-based method less applicable For example, a large number of human-made system problems, such as manufacturing automation, communication networks, computer performances, and/or general resource allocation problems, involve combinatorics rather than real analysis, symbols rather than variables, discrete instead of continuous choices, and synthesizing a configuration rather than proportioning the design parameters Such problems are often characterized by the lack of structure, presence of large uncertainties, and enormously large search spaces Performance evaluation, optimization, and validation of such systems problems are daunting, if not impossible, tasks Most analytical tools such as queueing network theory or Markov Chains fall short due to modeling or computational limitations Simulation becomes the general tool of choice or last resort Furthermore, optimization for such problems seems to call for a general search of the performance terrain or response surface as opposed to the “skiing downhill in a fog” metaphor of real variable based performance optimization Arguments for change can also be made on the technological front Sequential algorithms were often dictated as a result of the limited memory and centralized control of earlier generations of computers With the advent of modern general purpose parallel computing and the essentially unlimited size of virtual memory, distributed and parallel procedures on a network of machines can work hand-in-glove with the Search Based method of performance evaluation It is one of the theses of this paper to argue for such a complementary approach to optimization via simulation models 1 We hasten to add that we fully realize the distinction we make here is not absolutely black and white A continuum of problem types exist Similarly, there is a spectrum of the nature of optimization variables or search space ranging from continuous to integer to discrete to combinatorial to symbolic If we accept the need for a search-based method as a complement to analytical techniques, then we can argue that to quickly narrow the search for optimum performance to a “good enough” subset in the design universe is more important than to estimate accurately the values of the system performance during the initial stages of the process of optimization We should compare order first and estimate value second, i.e., ordinal optimization comes before cardinal optimization Colloquial expressions, such as “ballpark estimate”, “80/20 solution” and “forest vs trees”, state the same sentiment Furthermore, we shall argue that our preoccupation with the “best” may be only an ideal that will always be unattainable or not cost effective Real world solutions to real world problems will involve compromise for the “good enough” The purpose of this paper is to establish the distinct advantages of the softer approach of ordinal optimization for the search-based type of problems, to analyze some of its general properties, and to show the many orders of magnitude improvement in computational efficiency that are possible under this mindset; finally to also advocate the use of this mindset as an additional tool in the arsenal of soft computing and a complement to the more traditional cardinal approach to optimization and engineering design In the next section, we formulate the general simulation and optimization problems and point out the major limitation of traditional simulation approaches Ordinal optimization is introduced in Section In Section 4, we present the theoretical analysis that ordinal optimization can achieve an exponential convergence rate Section gives a budget allocation technique which can future improve efficient of exponential convergence of ordinal optimization In Section 6, we will present several successful stories of the application of ordinal optimization In particular, details of an extension of ordinal optimization to a class of resource application problem and its application to network control problem will be given Section concludes this paper Problem Statement Suppose we have a complex stochastic system The performance of this system is conceptually given by J() where J is the expected performance measure and are the system design parameters which may be continuous, discrete, combinatorial or even symbolic A general problem of stochastic optimization can be defined as J()  E[L( )]   (2.1) where the search space is an arbitrary, huge, structureless but finite set;  is the system design parameter vector; J, the performance criterion which is the expectation of L, the sample performance, as a functional of  and , a random vector that represents uncertain factors in the systems However, there are some basic limitations related to search-based methods which require the use of simulation to estimate performances J() as in Eq.(2.1) There are two aspects to this problem: the "stochastic" part and the "optimization" part Of course, here one is mindful of the business dictum, used by Mrs Fields for her enormously successful cookie shops, which says ”Good enough never is” However, this can be reconciled via the frame of reference of the optimization problem The "stochastic" aspect has to with the problem of performing numerical expectation since the functional L( ) is available only in the form of a complex calculation via simulation The standard approach is to estimate E[L( )] by sampling, i.e., t E[L( )]  Jˆ ( )   L( ,  i ) t i 1 (2.2) where i represents the i-th sample of  The ultimate accuracy (confidence interval) of this estimate cannot be improved upon faster than 1/t1/2, the result of averaging i.i.d noise This rate can be quite adequate or even very good for many problems but not good enough for the class of complex simulation-based performance evaluation problems mentioned above Here each sample of L(i) requires one simulation run which for the type of complex stochastic systems we are interested in here can take seconds or minutes even on the fastest computer not to mention the setup cost of programming the simulation Millions or billions of required samples of L(i) for different values of  may become an infeasible burden Yet there is no getting away from this fundamental limit of 1/t1/2 , i.e one order of magnitude increase in accuracy requires two orders of magnitude increase in sampling cost The second limitation has to with the "optimization" part Traditionally, analysis again plays an important part in numerical optimization whether it is convergence, gradient computation, or general hill climbing Again these tools, such as infinitesimal perturbation analysis (IPA), take advantage of the structure and real-variable nature of the problem to devise effective algorithms for optimization However, when we exhaust such advantages and the remaining problem becomes structureless and  is totally arbitrary as above, we encounter the limitation introduced by combinatorial explosion, or the so called NP-hard Limitation In such cases, a more or less blind search is the only alternative While it is often easy to prove probabilistic convergence of random search, it is very hard to improve the search efficiency An example will illustrate this: Suppose  is of size 1010,which is small by combinatorial standards We are allowed to take 10,000 samples of J() uniformly in A natural question is what the probability that at least one of these samples will belong in the top 50, top 500, or top 5000 of the designs of  is, i.e., P{at least one of the 10,000 sample is in the top-g} = - (1- g 10,000 ) 1010 (2.3) which turns out to be 0.00004999, 0.0004998, and 0.004988 respectively This means that for a simulation based optimization problem of any complexity, a simple random search is not an effective approach — too much computation for too little return For example, in the case of a single input single output static decision rule, the number of possibilities is the size of the output possibilities raised to the power of the size of the input possibilities This is an extremely large number To make matters worse, the No-Free-Lunch Theorem of Wolpert and MaCready (IEEE Journal on Evolutionary Computation, Vol 1, 1997) says that without structural information no algorithm can better on the average than blind search These two difficulties, the 1/t1/2 and the NP-Hard Limitation, are basic for general stochastic optimization problems of the type we are interested in Each one can induce an insurmountable computational burden To make matters worse, the effect of these limitations is multiplicative rather than additive No amount of theoretical analysis can help once we are faced with these limitations This is where ordinal optimization comes in Ordinal Optimization The idea of OO is to effect a strategic change of goals It is based on two tenets: (a) "Order" converges exponentially fast while "value" converges at rate 1/ t1/2 In other words, it is much easier to determine whether or not alternative A is better than B than to determine "A-B=?" This is intuitively reasonable if we think of the problem of holding identically-looking packages in each hand and trying to determine which package weighs more vs to estimate the difference in weight between the two packages More quantitatively, suppose we observe A and B through i.i.d zero mean additive normal noises The chance of mis-ordering A and B is roughly proportional to the area under the overlapping tail of the two density functions centered at A and B as illustrated below A plus perturbing noise B plus perturbing noise  This notion can be made more precise If A and B plus noise are the performance value estimates of two simulation experiments, then under the best conditions the estimates cannot converge to 1/2 the true values faster than 1/t where “t” is the length or number of replications of the simulation experiments But the relative order of A vs B converges exponentially fast (Dai 1996, Xie 1998) Furthermore, the value or cardinal estimation may not always be necessary In many cases, we simply try to determine which is the better alternative without necessarily knowing the detailed consequence of what each alternative entails, i.e., we emphasize the choice (order) rather than estimating the utility (value) of the choices (b) Goal softening eases the computational burden of finding the optimum Traditional optimization determines a single alternative in  and hopes that it coincides with the true optimum The metaphor of hitting one speeding bullet with another comes to mind Instead if we This is a euphemism for retreat are willing to settle for the set of "good enough" alternatives with high probabilities, e.g., any of the top-n% of the choices for 95% of the time, then this probability improves dramatically with the size of these sets (i.e., n) It is like trying to hit a truck with a shotgun Let us formalize this notion a bit Suppose we define a subset of the design space  as the “good enough” subset, G, which could be the top-g designs or top-n(g/N)% of the designs of the sampled set of N designs We denote the size of the subset G as |G| g We now pick out (blindly or by some rule) another subset, called the selected subset, S with the same cardinality, i.e., |S|=|G|=g We now ask the question what is the probability that among the set S we have at least k (≤ g) of the members of G, i.e., P{|GS|≥k} This question is relevant since our rule for picking the set S clearly affects how many good designs we get in S In the case of blind picking (i.e., a purely random selection of S), we expect the overlap to be smaller than the result of selecting based on some estimated order of the alternatives We define P{|GS|k} as the alignment probability which is a measure of the goodness of our selection rules Fig.3.1 below illustrates the concept For perspective, we have also indicated the traditional optimization view where the subsets G and S are both singletons : G: S: •: : :  G • S Search Space Good Enough Subset Selected Subset Optimum Estimated Opt GS Figure 3.1 Generalized concept for Optimization For the case of blind picking, which can be considered as observing the relative orders through noise having infinite variance (i.e., any design can be considered as the good enough design and vice versa), this alignment probability is given by Ho et al (1992) gN  g  i  g    P{|GS|≥k} =  i k N  g   g i  (3.1) For example, if N=200, g=12, and k=1, then P P{|GS|≥k} > 0.5 When N=200, g=32, and k=1 P is almost We can calculate P for other values N, g, and k easily, one example of which for N=1000 is illustrated in Figure 3.2 below Note |S| and |G| need not be equal in which case a slightly more general from of Eq.(4) to follow results Alignment Probability 1.0 k=1 0.5 0.0000 10 k=10 100 50 150 g 200 Figure 3.2 P vs g parameterized by k=1,2, , 10 for N=1000 Once again this shows that by blindly picking some 50-130 choices out of 1000 we can get quite high values for P Remember these curves are for blind picks On the other hand, if we don’t pick blindly, i.e., the variance of the estimation error is not infinite and estimated good designs have a better chance at being actual good designs, then we should definitely better Suppose now we no longer pick designs blindly, but pick according to their estimated performance, however approximate they may be, e.g., the selected subset S can be the estimated top-g or top-n(g/N)% of the designs It turns out we no longer can determine P in closed form But P can be easily and experimentally calculated via a simple Monte Carlo experiment as a function of the noise variance 2, N, g, and k, i.e., the response surface P{|GS|≥k; 2, N} For example, we can assume a particular form for J(), e.g., J(i) = i in which case the best design is 1 (we are minimizing) and the worst design is N and the ordered performances are linearly increasing For any finite i.i.d estimation noise w, we can implement Jˆ ( ) = J() + w and directly check the alignment between the observed and true order More generally, this alignment probability can be made precise and quantitative (Lau and Ho 1997, Lee et al 1998) and shown to be exponentially increasing with respect to the size of G and S Consequently, by combining the two basic ideas (a) and (b) of OO in this section, several orders of magnitude of easing of the computational burden can be achieved In fact, some special cases of the alignment probability is also called the probability of correct selection or P{CS} in the parlance of the simulation literature There exists a large literature on assessing P{CS} based on classical statistical models Goldsman and Nelson (1994) provide an excellent survey on available approaches to ranking, selection, and multiple This can be characterized as the Horse Race selection rule where we estimate the performance of ALL designs and pick the estimated top-g designs comparisons (e.g., Goldsman, Nelson, and Schmeiser 1991, Gupta and Panchapakesan 1979) In addition, Bechhofer et al (1995) provide a systematic and more detailed discussion on this issue However, most of these approaches are only suitable for problems with a small number of designs (e.g., Goldsman and Nelson (1994) suggest to 20 designs) The question "Isn't OO just reinventing the wheel of ranking and selection literature in statistics?" may occur to the reader at this point First, there is a scale difference, since in OO we are dealing with design space of size billions times billions while R&S typically deal with tens of choices Secondly, OO does not address questions such as the distance between the best and the rest, a cardinal notion; or whether or not the observed order is the maximum likelihood estimate of the actual order, an elegant concept in Ranking and Selection literature but not very useful in the OO context Also, the notion of softening which is a main tenet for OO is not present in R&S literature Lastly, OO is complementary to other tools of computational intelligence such as fuzzy logic and genetic algorithms In genetic algorithm, one must evaluate the fitness of a population of candidate designs However, it is implicitly assumed in almost all GA literature that this evaluation is simple to carry out For complex simulation problems we have in mind here, this is not the case Thus taking advantage of the ability of OO to quickly separate the good from the bad one can extend the applicability of GA to problems heretofore not feasible computationally Similarly, in OO we have defined concepts such as "good enough" or high alignment probability as crisp notions It is straight forward to fuzzify these definitions and have OO apply to fuzzy systems problems In other words, "softening" and "fuzzification" are distinct and complementary concepts Much remains to be done In the next section, we will show the fast convergence of ordinal optimization theoretically Convergence of Ordinal Comparison In this section, we provide a theoretical investigation of the efficiency of ordinal comparison Without loss of generality, we say that (design) i is better than j if J(i) < J(j) For notational simplicity, in this section, we assume that that the N designs are indexed in such as way that  < J(1) > J(2) > J(3) > > J(N) > - We are interested in finding a design in G, one of the M ,  M  N , true best designs in  Let H =  - G We use Jˆ ( , t ) to denote an estimate of J() based on a simulation of length t In order to characterize the rate of convergence for ordinal comparison, we define the following indicator process 1 if  G Jˆ  , t  min  H Jˆ  , t  I  t    otherwise Then I(t) is equal to if the observed best design is a good design in G and equal to otherwise I(t) is a function of t indicating when the observed best design is one of the good designs We next examine the convergence of P{I(t) = 1} and P{I(t) = 0} as a function of simulation length of simulation samples t The convergence rate of I(t) will be taken as that of P{I(t)=1} or P{I(t)=0} A lower bound In many cases such as finite-state Markov chains and averaging i.i.d random variables, the variance of an estimate Jˆ ( , t ) decays to zero at rate of O(1/t) Such delay rate of the variance can be used to derive a lower bound on the rate of convergence for the indicator process, which is precisely stated in the following theorem Theorem 4.1 (Dai 1996) Assume that (A4.1) The stochastic system under simulation is ergodic in the sense that lim L( ,  , t ) = J(), a.s t  for any    , and that the variance of the estimate Jˆ ( , t ) decays to zero at rate O(1/t) for all   , i.e., Var( Jˆ ( , t ) ) = O(1/t) Then P{I(t) = 1} = - O(1/t), P{I(t) = 0} = O(1/t), in other words, the indicator process I(t) converges to at the rate of at least O(1/t) # Assumption (A4.1) is very mild and satisfied by most stochastic systems This assumption guarantees that J() can be estimated with arbitrary accuracy by increasing the simulation duration of any one sample trajectory Exponential convergence Theorem 4.1 gives a very conservative bound for general cases In many situations, we know much more about the structure and distribution of X(, t) Quite often, the rate of convergence can be proven faster than that in Theorem 4.1 Especially, we consider the case of simulating general regenerative systems Definition 4.2 (Kalashnikov 1994, p.19) An {X(, t)}is said to be regenerative (in a classic sense) if there is an increasing sequence of nonnegative finite random times { i(), i  0} such that for each (i) {X(, i()+t), k() - i(), t  0, k  i } are identically distributed (ii) {X(, i()+t), k() - i(), t  0, k  i } does not depend on {X(, t), j(), ti, 0ji} # The sequence {i(), i  1} is a sequence of regenerative points We consider two types of performance measures K  t ,   i    1t Jˆ ( , t ) = ( X ( ,  , u )) du and Jˆ ( , t ) = t0  ( X ( ,  , u ))du i 1  i     K  t ,  (4.1)  Ti   i 1 where (x )  R, Ti() = i() - i-1(), i  1, is the length of the ith regeneration cycle, and K(t,) = max {i: i()  t, i  0} i is the number of regeneration cycles on [0, t] K (t , ) = K(t,) + is the first passage time process For convenience, we define T0() = 0() Performance measures of the form (4.1) frequently appear in simulation They are essentially identical except for the tail terms However, these tail terms typically converge at the rate of O(1/t), slower than the rate of ordinal comparison as we shall show Therefore, such tail terms have to be taken into account of consideration For every   , we assume that (A4.1) holds and (A4.2) The cycle time T1() has finite, continuous moment generating function E[e sT1   ] in a neighborhood of s = 0, s  0; (A4.3) The cycle time T1() is not degenerate in the sense of limt0 P{T1()  t} < (A4.4) The function (x )  R is bounded and | (x ) |  B, < B <  Assumptions (A4.2) and (A4.4) are technical and satisfied in many cases (Dai 1996) Assumption (A4.3) is very mild Note that (A4.2) and (A4.3) imply < E[T1()] <  Regenerative systems constitute a large class of systems For regenerative systems, the rate of convergence of ordinal comparison in exponential as stated in the following theorem Theorem 4.3 (Dai 1996) Suppose (A4.1)-(A4.4) holds Then there exists a positive number  > such that P{I(t) = 1} = - O(e-t), P{I(t) = 0} = O(e-t), for both of the performance measures Jˆ ( , t ) and Jˆ ( , t ) , in other words, the indicator process converges at an exponential rate # Relaxation of assumptions (A4.1)-(A4.4) in terms of large deviation principles is provided in Dai and Chen (1997) The constant  in the rate of convergence is given by the following expression   ˆ ˆ lim log E e st  min G J   ,t   min H J  ,t    =  sinf [ 0, ) t   t Computing Budget Allocation for Further Efficiency Improvement 10 While ordinal optimization could significantly reduce the computational cost for design selection, there is potential to further improve its performance by intelligently determining the number of simulation samples or replications among different designs Intuitively, to ensure a high alignment probability, a larger portion of the computing budget should be allocated to those designs which are potentially good designs In other words, a larger number of simulations must be conducted with those critical designs in order to reduce estimator variance On the other hand, limited computational effort should be expanded on non-critical designs that have little effect on identifying the good designs even if they have large variances In doing so, less computational effort is spent on simulating non-critical designs and more computational effort is spent on simulating critical designs; hence, the overall simulation efficiency is improved Ideally, we want to optimally choose the number of simulation samples for all designs to maximize simulation efficiency with a given computing budget In this section, we present a technique called optimal computing budget allocation (OCBA) which makes use of this idea Let t be the number of simulation samples of design  If simulation is performed on a sequential computer and the difference of computation costs of simulating different designs is negligible, the total computation cost can be approximated by   t The goal is to choose t for all  such that the total computation cost is minimized, subject to the restriction that the confidence level defined by alignment probability is greater than some satisfactory level   t t s.t alignment probability  P* (5.1.a) where P* is a user-defined confidence level requirement Or one may intend to choose t such that the alignment probability is maximized, subject to a limited computing budget T max [alignment probability] t s.t   t = T (5.1.b) Three major issues for solving (5.1) are the following:  There is no closed-form expression for the alignment probability  The alignment probability at t,   , can be computed only after all t simulation samples for all designs are performed  t,   , are integers and the number of combinations for different t is large even for moderate size of  To solve the budget allocation problem in (5.1), the alignment probability must be estimated easily An extra Monte Carlo simulation for estimating the alignment probability would be too expensive under this setting On the other hand, most of existing ranking and selection approaches are only suitable for problems with a small number of designs (e.g., Goldsman and Nelson (1994) suggest to 20 designs) Using a Bayesian model, Chen (1996) gives an estimation technique to quantify the confidence level for ordinal comparison when the number of 11 designs is large In addition to the confidence probability, this approach also provides sensitivity information for each algorithm, which is useful when solving the allocation problem in (5.1) It is convenient to let (1), (2), … , (N) denote the indices (order statistics) such that t( 1) t( ) t( N ) t ( 1) i1 t ( 2) i 1 t ( N) i 1  L( (1) ,  i )   L( ( ) , i )  …   L( ( N ) , i ) (5.2) Under this notation, design (1) has the smallest sample mean and is the observed best design We have the following estimator of P{CS} from Chen (1996) ~ Theorem 5.1 Let J  denote the random variable whose probability distribution is the posterior ~ distribution of the expected performance for design  under a Bayesian model Assume J i and ~ J j , i j, are independent (i.e., the simulations for different designs are independent) For a minimization problem, P{|GS|k}  N  j G  S  k 1 ~ ~ P{J (1)  J ( j ) } Approximate Probability of Correct Selection (APCS) # ~ Under the Bayesian model, the posterior distribution p( J  ) consists of information from both prior distribution and the simulation results {L(, i), i = 1, 2, , t} Furthermore, with mild normal assumption, if the variance   is known (Bernardo and Smith 1994), t 2 ~ J  ~ N(  L( ,  i ) ,  ) t i1 t ~ If the variance is unknown,   can be replaced by the sample variance and J  becomes t-distributed (Chick 1997) For ease of explanation, we will concentrate on the case that |S| = |G| = k = in the rest of this section In which case,  t( j )  t ( 1)  L( ( j ) , i )  L( (1) , i )     t( j ) i 1  N N t(1) i 1 ~ ~  APCS =  P{J (1)  J ( j ) } =   j 2 j 2    (2j )  (21)    t( j ) t(1)   (5.3) where  is the standard normal cumulative distribution The APCS in (5.3) gives an estimate of the probability that the selected observed best design is indeed the true best design Note that the computation of APCS is simply a product of pairwise comparison probabilities, which is easy to compute Another critical issue in solving (5.1) is a cost-effective method to estimate the sensitivity information, or more specifically, the estimation of the new P{CS} if  additional simulation samples are performed on design s 12 Definition 5.2 Define EPCS(t1, t2, , t-1, t+, t+1, , tN) as an estimated P{CS} if additional  simulation samples are performed on design  EPCS is computed using the statistical information after t1, t2, , tN simulations are completed for designs 1, , N, respectively t ~ 2 Let J~ ~ N(  L( , i ) ,  ) If the observed sample means are ranked based on t i1 t   (5.2), Chen, et al (1996) give a simple and effective way to estimate the EPCS : ~ ~ ~ If   (1), EPCS(t1, t2, , t-1, t+, t+1, , tN)  P{J (1)  J s }  N ~ ~  P{J (1)  J ( j ) } , and j 2, ( j )  N ~ ~ ~ If  = (1), EPCS(t1, t2, , t-1, t+, t+1, , tN)   P{J (1)  J ( j ) } j 2 Design (1) is the observed best design which has the smallest sample mean Note that the expression of EPCS is similar to that of APCS in Eq (5.3) and is easy to compute EPCS provides sensitivity information about how APCS will change if additional  simulation samples are performed on design  Since the estimation of the alignment probability and its sensitivity information is feasible, we are able to deal with the budget allocation problem in (5.1) Our approach builds on the idea that the computing budget should be allocated in a way that best improves the overall confidence level This approach for the problem in (5.1.a) is outlined as follows: Step Perform simulations and obtain n0 samples for all designs; l0; t 1l  t l2 t lN = t0 , t lN )  P*, stop Step If APCS( t 1l , t 2l , Step Increase the computing budget (i.e., number of additional simulations) by  and allocate this budget across the candidate designs by solving the subproblem: EPCS (t1l  l1 , t 2l  l2 , , t kl  lk ) , s.t  k l =  and li  for all i.} { max l l i 1 i ,,  k Step 3.Perform simulations and obtain additional li samples for design i, i = 1, , k t il 1 = t il + li , for i = 1, , k; l l + Go to Step For the maximization problem in (5.1.b), the above algorithm still applies except the stopping criterion in Step must be substituted by "If   t  T, stop." A last issue remained for solving the budget allocation problem in (5.1) is the development of an efficient and effective procedure for solving the subproblem in Step We have explored two major approaches: a) solving the nonlinear program directly, and b) replacing the objective function with an approximation and then solved it analytically The two approaches are briefly described as follows: 13 Nonlinear Programming Approach Here we assume the variables are continuous and apply nonlinear programming techniques (Luenberger 1984, Nash and Sofer 1996) to obtain an approximate solution to the subproblem in Step The gradient of APCS with respect to t can be estimated by  EPCS (t1 , t ,, t   , t 1 ,, t N )  APCS (t1 , t ,, t , t 1 ,, t N ) APCS = t   Chen et al (1997) develop an effective OCBA technique by integrating a simple steepest-descent method Numerical testing show that the OCBA can significantly reduced simulation time as compared with the well-known computing budget allocation techniques such as the two-stage procedures given in Rinott (1978) and Dudewicz and Dalal (1975) Objective Function Approximation Approach The nonlinear programming approach needs an extra computation cost to iteratively search the solution to the subproblem in Step Another approach which we have exploited is to replace the objective function with an approximation and then find an analytical solution to the approximation problem The advantage is that there is no need to iteratively apply a nonlinear programming technique in Step for an optimal solution Chen et al (1998c) approximate EPCS using the Chernoff bounds An asymptotically solution can be found and is summarized in the following theorem Theorem 5.3 Given total number of simulation budget T to be allocated to N competing designs, the Approximate Probability of Correct Selection (APCS) can be asymptotically maximized when (a) (b) for  = 1, , N and   (1)  (2), ti tj L ( i ,  )  L( j , s ) , for any i, j   where i,j =  s t i s1 t j s1 # The budget allocation problems for some special cases have already been studied by other researchers It is worthwhile to compare with our results in Theorem in 5.3 with some wellknown solutions Remarks 5.4 and 5.5 show that our approach achieves the same results as those earlier approaches under the required special conditions Note that our approach is also applicable to much more general problems Remark 5.4 In the case of N = 2, Theorem 5.3 yields # This evaluated result is identical to the well-known optimal allocation solution for N = Remark 5.5 Assume that (a) (1), = , for  = 1, 2, , N, and   (1), (b) # 14 In this special case, Theorem 5.3 obtains t(1) = tg(N-1)1/2 for  = 1, 2, , N, and   (1), which is the same as the results in Bechhofer (1969) and Dunnett (1955) Applications and Extensions of Ordinal optimization As shown in section 4, the convergence rate for ordinal comparison can be exponential for many stochastic problems We can have good estimates on the ordinal comparison while the value of an estimate is still very poor If our goal is to find the good designs rather than to find an accurate estimate of the best performance value, which is true in many practical situations, it is advantageous to use ordinal comparison for selecting a good design This idea has been applied on a 10-node network (Patsis et al 1997), where it is shown that a good design can be identified with high probability with relatively short simulations From a computational perspective, they demonstrate many orders of magnitude of speedup The idea has been successfully applied to several other problems For example, a heuristic algorithm selection problem is formatted as a stochastic optimization problem in Chen et al (1998b) and only a much smaller amount of simulation time is needed to identify a good heuristic algorithm Chen et al (1998a) show that ordinal optimization can be applied to the robot motion planning problem for identifying a good enough motion plan with a very short computation time In addition to the direct use of ordinal optimization, the ordinal comparison idea can also be integrated into some procedures and achieve a much faster convergence rate For example, the Stochastic Comparison Algorithm integrates ordinal optimization into a recursive selection procedure and obtains a fast convergence rate (Gong et al 1999) In this paper, details of an extension of ordinal optimization to a class of resource application problem and its application to network control problem is given below As shown in section 4, the convergence rate for ordinal comparison can be exponential for many stochastic problems We can have good estimate on the ordinal comparison while the value estimate is still very poor If our goal is to find the good designs rather than to find an accurate estimate of the best performance value, which is true in many practical situations, it is advantageous to use ordinal comparison for selecting the good designs This idea has been applied on a 10-node network (Patsis et al 1997), where it is shown that a good design can be identified with high probability with relatively short simulations From a computational perspective, they demonstrate many orders of magnitude of speedup The idea has been successfully applied to several other problems For example, a heuristic algorithm selection problem is formatted as a stochastic optimization problem in Chen et al (1998b) and only a much smaller amount of simulation time is needed to identify a good heuristic algorithm Chen et al (1998a) show that ordinal optimization can be applied to the robot motion planning problem for identifying a good enough motion plan with a very short computation time Ho and Larson (1995) showed how to apply OO to rare event simulations In Deng and Ho (1999), ordinal optimization was used to search for the global optimum in a well known open non-convex optimal control problem of 30 year standing In addition to the direct use of ordinal optimization, the ordinal comparison idea can also be integrated into some procedures and achieve a much faster convergence rate For example, Stochastic Comparison Algorithm integrates ordinal optimization into a recursive selection procedure and obtain a fast convergence rate (Gong et al 15 1999) Ordinal optimization can also be combined with massively parallel computer to achieve speed up in search for optimum (Ho 1994) Additional applications and an interactive online demo can be found at the website: hrl.harvard.edu/~ho/DEDS In this paper, details of an extension of ordinal optimization to a class of resource application problem and its application to network control problem is given below In the resource allocation problems studied in Cassandras et al (1998), there are r identical resources to be allocated over N user classes so as to optimize some system performance measure (objective function) Let s = [n1,…,nN] denote an allocation and S = {[n1, N …,nN] :  i 1 ni = r, ni {1, …, r}} be the finite set of feasible allocations, where ni is the number of resources allocated to user class i Let Ji(s) be the class i cost associated with the allocation vector s Assuming that Ji(s) depends only on the number of resources assigned to class i, we can write Ji(s) = Ji(ni) Then, the basic resource allocation problem is N (RA)   i J i  ni  sS i 1 where, without loss of generality for the purpose of our analysis, we may set i = for all i In a stochastic setting, the cost function Ji(s) is usually an expectation whose exact value is generally difficult to obtain and one resorts to estimates of Ji(s) which may be obtained ~t through simulation or through direct on-line observation of a system We denote by Li  s  an estimate of Ji(s) based on observing a sample path for a time period of length t Such stochastic discrete resource allocation problems are hard due to the combinatorial explosion of the feasible allocation search space, as well as the absence of closed-form expressions for the cost function of interest The stochastic resource allocation algorithm considered here is represented as a Markov ~ ~ ~ ~ ~ ~ process {( ~sk ,Ck )} where the vector sk [n1,k , n 2,k , , n N ,k ] is the allocation after the kth step, C k is a subset of user indices after the kth step updated as shown below, and the " ~ " notation is used ~t to indicate that all quantities involved are based on estimates Li  s  of Ji(s), obtained from a sample path of length t; in particular, the length of the sample path at the kth iteration of the ~ process is denoted by f(k) and the corresponding cost estimates by Lif  k    With C0 = {1, …, N}, after proper initialization, we have: ~  n~i , k  if i  ~ ik * and  k  ~ ~ n~i , k 1   n~i , k  if i  jk* and  k  n~ otherwise  i,k ~i , ~j   ~i , ~j   * k * k * k * k (6.1) where ~f k ~ ~*  ni , k  , ik arg max ~ Li iC k (6.2) ~f k ~ ~*  ni , k  , j k arg ~ Li iC k (6.3)     16     ~ ~ ~ ~ ~  k  ik * , j k*  L~if * k  n~~i * ,k  L~jf * k  n~~j * ,k  k k k k (6.4) ~  C~k  ~j k* if ~k ~ ik * , j k* 0  ~ ~ C k 1  C if C k 1 (6.5) ~ otherwise  C k ~f k  ~f k  ~t where Li  ni   Li  ni   Li  ni  1 It is clear that equations (6.1) - (6.5) define a Markov ~ ~* ~ ~* ~* process ~sk ,C k , whose state transition probability matrix is determined by ik , j k , and  k ( ik , ~* ~ ~ j k ) Briefly, ik * , and j k* estimate the users ``most sensitive'' and "least sensitive" to the removal ~ of a resource among those users in the set C k Then, (6.1) forces the exchange of a resource from ~ ~* the most to the least sensitive user at the kth step of this process, provided the quantity  k ( ik , ~ ~* j k ) is strictly positive; otherwise, the allocation is unaffected, but the user with index jk * is ~ removed from the set C k through (6.5) When this set is reduced to a single user, it is reset to C0 ~ ~* ~* and the procedure repeats The quantity  k ( ik , j k ) is the estimated "potential cost reduction"       when an actual change in allocation takes place We will make the following three assumptions: (A6.1) For all i = 1, …, N, Ji(ni) is such that Ji(ni+1) > Ji(ni), where Ji(ni) = Ji(ni) - Ji(ni-1), ni = 1, , r, with Ji(0)  - and Ji(N+1)   ~t (A6.2) For every i, and every ni, the estimate Li  ni  is ergodic as the sample path length ~ Lit  ni  = Ji(ni), a.s increases in the sense that lim t  ~ ~* ~* ~ ~ (A6.3) Let  k  i, j  J i  ni ,k   L j  n j ,k  1 For every  k ( ik , j k ) = , there is a constant p0 ~ ~* ~* ~* ~* ~ ~ such that P{ k ik , jk 0  k ik , jk 0, sk , Ck  s, C }  p0  , for any k and any pair ~s , C~ k k         Note that (A6.3) is a technical condition guaranteeing that an estimate does not always give one-sided-biased incorrect information Also note that the results below not require the technical condition that the optimal allocation be unique If several allocations exhibit optimal performance, the proposed scheme will converge to a set of allocations and will oscillate among the members of the set Under (A6.1)-(A6.3), it was proven that ~sk converges in probability to the optimal solution as long as s* as long as f(k)  , as k   Cassandras et al (1998) give detailed discussion and interpretation of the above process, and the proof of the following two theorems Theorem 6.1 Suppose that Assumptions (A6.1)-(A6.3) hold and that the optimum s* is unique Then the process described by equations (6.1) - (6.5) converges in probability to the optimal allocation s* # 17 Theorem 6.2 Suppose (A6.1)-(A6.3) and the assumptions of Lower bound of the convergence rate of comparison from other parets hold If f(k)  k1+c for some constant c > 0, then the process described by equations (6.1) - (6.5) converges almost surely to the global optimal allocation # Theorems 6.1 and 6.2 establish the convergence of the algorithm For regenerative systems, stronger results regarding the rate of convergence have been obtained as stated in the following theorem Theorem 6.3 (Dai et al 1998) Suppose that Assumptions (A6.1)-(A6.3) hold and for every pair ~t ~t i, j, i  j, Li  ni   L j n j  is a regenerative process with regenerative cycle times satisfying (A4.1)-(A4.4) Then, there exists a c > large enough such that if f(k)  cln(k) for sufficiently large k, the process described by equations (6.1) - (6.5) converges almost surely to the global optimal allocation Furthermore, ~ P{ sk = s*} = - O(ln(k) e-f(k)) for some  > # Note that ordinal comparison is used in (6.1) Since ordinal comparison allows one to make a correct decision based on short simulations, the algorithm can quickly get to the neighborhood of the optimal solution The fast convergence has been confirmed experimentally as shown in Cassandras et al (1998) Compared with conventional stochastic approximation methods, the algorithm also eliminating the difficult task of selecting appropriate "step-size" A numerical example p1 n1  p2 n2 pN 1 2 nN N N Figure 6.1 Queueing system with N parallel servers As an application, we consider a buffer allocation problem for a queueing system as shown in Figure 6.1, where each server represents a user and each buffer slot represents a resource that is to be allocated to a user Jobs arrive at this system at a rate λ and are routed to one of the N users with some probability pi, i = 1, …, N Each user is processing jobs at a rate μi, i = 1, …, N, and if a job is routed to a user with a full queue, the job is lost Let Ji(ni) be the 18 individual job loss probability of the ith server (ni is the number of buffers allocated to the ith server) Our goal is to allocate all K available buffer slots to the users in order to minimize the objective function N  J i  ni  i 1 s.t N n i K i 1 It can be easily verified that assumptions A1-A3 are satisfied for this example (Cassandras et al 1998) For this problem, one can directly apply the algorithm corresponding to the process (6.1) -(6.5) Standard concurrent estimation methods (e.g., see Cassandras and ~f k  Panayiotou 1996) may be used to obtain all Li  n~i ,k  required by the algorithm from ~ ~ ~ observation of a sample path under a given allocation  n1,k , n2,k , , n N ,k , For simplicity, in our simulation of this system we have assumed that the arrival process is Poisson with rate λ = 0.5, and all service times are exponential with rates i = 1.0, i = 1, …, Furthermore, all routing probabilities were chosen to be equal, i.e., pi = 1/6, i = 1, …, Figure 6.2 shows typical traces of the evolution of the algorithm for this system when N = users and K = 24 buffer slots (resources) In this problem, it is obvious that the optimal allocation is [4,4,4,4,4,4] due to symmetry This was chosen only to facilitate the determination of the optimal allocation; the speed of convergence, however, is not dependent on the assumptions made in this example The two traces of Figure 6.2 (on page 22) correspond to two different initial allocations In this case, the simulation length is increased linearly in steps of 3,000 events per iteration and one can easily see that even if we start at one of the worst possible allocations (e.g [19, 1, 1, 1, 1, 1,]), the cost quickly converges to the neighborhood of the optimal point Conclusions The development of OO is still in its infancy By redefining what is desired in a simulation (e.g., order rather than value estimates, good enough rather than optimal choices) we present a general method for easing the computational burden associated with complex stochastic optimization problems via simulation The methodology is easy to understand and as a rule does not require any additional computation for its use Similar to the role of order statistics in sampling, it is totally complementary to any other tools one may be using References Bechhofer, R E "Optimal Allocation Of Observations When Comparing Several Treatments With A Control," Multivariate Analysis II, P R krishnaiah, ed., Academic Press, Inc 463473, 1969 Bechhofer, R E., T J Santner, and D M Goldsman Design and Analysis of Experiments for Statistical Selection, Screening, and Multiple Comparisons, John Wiley & Sons, Inc., 1995 Bernardo, J M., and A F M Smith Bayesian Theory Wiley, 1994 19 Cassandras, C.G., L Dai, and C G Panayiotou, "Ordinal Optimization for Deterministic and Stochastic Discrete Resource Allocation," IEEE Trans on Automatic Control, Vol 43, No 7, 881-900, 1998 Cassandras, C G., and C G Panayiotou, "Concurrent Sample Path Analysis of Discrete Event Systems", Proc of 35th IEEE Conf on Decision and Control, 3332-3337, 1996 Chen, C H "A Lower Bound for the Correct Subset-Selection Probability and Its Application to Discrete Event System Simulations," IEEE Transactions on Automatic Control, Vol 41, No 8, 1227-1231, August, 1996 Chen, H C., C H Chen, L Dai, and E Yucesan, "New Development of Optimal Computing Budget Allocation For Discrete Event Simulation," Proceedings of the 1997 Winter Simulation Conference, 334-341, December, 1997 Chen, C H., H C Chen, and L Dai, "A Gradient Approach of Smartly Allocating Computing Budget for Discrete Event Simulation," Proceedings of the 1996 Winter Simulation Conference, 398-405, December, 1996 Chen, C H., V Kumar, and Y C Luo, "Motion Planning of Walking Robots Using Ordinal Optimization," IEEE Robotics and Automation Magazine, 22-32, June, 1998a 10 Chen, C H., S D Wu, and L Dai, "Ordinal Comparison of Heuristic Algorithms Using Stochastic Optimization," to appear in IEEE Trans on Robotics and Automation, 1998b 11 Chen, H C., C H., Chen, and E Yucesan, "Optimal Computing Budget Allocation in Selecting The Best System Design via Discrete Event Simulation," submitted to IEEE Transactions on Control Systems Technology, 1998c 12 Chick, S E "Bayesian Analysis for Simulation Input and Output," Proceedings of the 1997 Winter Simulation Conference, 253-260, 1997 13 Dai, L., "Convergence Properties of Ordinal Comparison in the Simulation of Discrete Event Dynamic Systems," J of Opt Theory and Applications, Vol 91, No.2, 363-388, 1996 14 Dai, L., C G Cassandras, and C Panayiotou, "On The Convergence Rate of Ordinal Optimization for A Class of Stochastic Discrete Resource Allocation Problems,'' to appear in IEEE Transactions on Automatic Control, 1998 15 Dai, L and C H Chen, "Rate of Convergence for Ordinal Comparison of Dependent Simulations in Discrete Event Dynamic Systems," Journal of Optimization Theory and Applications, Vol 94, No 1, 29-54, July, 1997 16 Dudewicz, E J and S R Dalal, Allocation of Observations in Ranking and Selection with Unequal Variances Sankhya, B37:28-78, 1975 17 Dunnett, C W "A Multiple Comparison Procedure For Comparing Several Treatments With A Control," Journal of American Statistics Association 50, 1096-1121, 1955 18 Goldsman, G., and B L Nelson "Ranking, Selection, and Multiple Comparison in Computer Simulation," Proceedings of the 1994 Winter Simulation Conference, 192-199, 1994 19 Goldsman, G., B L Nelson, and B Schmeiser "Methods for Selecting the Best System," Proceedings of the 1991 Winter Simulation Conference, 177-186, 1991 20 20 Gong, W B., Y C Ho and W Zhai, "Stochastic Comparison Algorithm for Discrete Optimization with Estimation," to appear in SIAM Journal on Optimization, April 1999 21 Gupta, S S and S Panchapakesan, Multiple Decision Procedures: Theory and Methodolgy of Selecting and Ranking Populations, John Wiley, New York 1979 22 Ho, Y C., R S Sreenivas, and P Vakili, "Ordinal Optimization of DEDS," Journal of Discrete Event Dynamic Systems, Vol 2, No 2, 61-88, 1992 23 Kalashnikov, V V Topics on Regenerative Processes, CRC Press, 1994 24 Lau, T.W.E and Y C Ho, "Universal Alignment Probabilities and Subset Selection for Ordinal Optimization" JOTA , Vol 39, No 3, 455-490, June 1997 25 Lee, L H, T.W.E Lau, Y C Ho, "Explanation of Goal Softening in Ordinal Optimization" IEEE Transactions on Automatic Control, July 1998 26 Luenberger, D G., Linear and Nonlinear Programming Addison-Wesley, 1984 27 Nash, S G., and A Sofer, Linear and Nonlinear Programming, McGraw-Hill, Inc., 1996 28 Patsis, N T., C H Chen, and M E Larson, "SIMD Parallel Discrete Event Dynamic System Simulation," IEEE Trans on Control Systems Technology, Vol 5, No 3, 30-41, Jan 1997 29 Rinott, Y., On Two-stage Selection Procedures and Related Probability Inequalities Communications in Statistics A7, 799-811, 1978 30 Xie, X L., "Dynamics and Convergence Rate of Ordinal Comparison of Stochastic Discrete Event Systems," IEEE Transactions on Automatic Control, 1998 21 Figure 6.2 Typical trace for different initial allocations 22 ... computing and a complement to the more traditional cardinal approach to optimization and engineering design In the next section, we formulate the general simulation and optimization problems and point... convergence of ordinal optimization In Section 6, we will present several successful stories of the application of ordinal optimization In particular, details of an extension of ordinal optimization. .. S and S Panchapakesan, Multiple Decision Procedures: Theory and Methodolgy of Selecting and Ranking Populations, John Wiley, New York 1979 22 Ho, Y C., R S Sreenivas, and P Vakili, "Ordinal Optimization

Định dạng
Số trang	22
Dung lượng	389,5 KB