1. Trang chủ
  2. » Tất cả

Development of parallel block multistage scheme of dimension reduction for globalizer lite parallel software system

9 2 0
Tài liệu đã được kiểm tra trùng lặp

Đang tải... (xem toàn văn)

THÔNG TIN TÀI LIỆU

Development of Parallel Block Multistage Scheme of Dimension Reduction for Globalizer Lite Parallel Software System Procedia Computer Science Procedia Computer Science 101 , 2016 , Pages 18 – 26 YSC 2[.]

Procedia Computer Science Procedia Computer Science 101, 2016, Pages 18 – 26 YSC YSC 2016 2016 5th 5th International InternationalYoung Young Scientist Scientist Conference Conference on on Computational Computational Science, Science Development of Parallel Block Multistage Scheme of Dimension Reduction for Globalizer Lite Parallel Software System Anna Zhbanova Lobachevskii State University of Nizhny Novgorod, Russia annazhb@gmail.com Abstract An approach to solving the global optimization problems using the block multistage scheme of the dimension reduction, which combines the use of Peano curve type evolvents and the multistage reduction scheme, is presented The scheme allows an efficient parallelization of the computations and increasing the number of processors employed in the parallel solving of the globally optimization search problems many times The synchronous and asynchronous schemes of MPI-implementation of this approach in Globalizer Lite software system are described The results of comparing the schemes demonstrating the advantage of the asynchronous variant are presented Keywords: multidimensional multiextremal optimization, global search algorithms, parallel computations, dimension reduction, block multistage dimension reduction scheme Introduction In spite of a rapid increase of performance of the computers, there are some important problems, for the analysis and investigation of which the performance of existing computer systems appears to be insufficient (Strongin, 2013) Among these problems, there are those, which can be reduced to the multidimensional problems of the multiextremal nonlinear programming, the sought solution of which is the global extremum The computation costs of such problems leads to the necessity of parallelization of the algorithms and of the use the parallel computer systems The present paper is arranged as follows In Sec 2, the general statement of the multidimensional global optimization problem is considered In Sec 3, the approaches to its solving based on the information-statistical theory of multiextremal optimization (Strongin, R.G., Sergeyev, Ya.D., 2000) is given In Sec 4, the Globalizer Lite software system is presented, and the implementations of the synchronous and asynchronous schemes of parallelization of the block multistage scheme made are described In Sec 5, the results of experiments on comparing the efficiencies of these implementations 18 doi:10.1016/j.procs.2016.11.004 Peer-review under responsibility of organizing committee of the scientific committee of the 5th International Young Scientist Conference on Computational Science © 2016 The Authors Published by Elsevier B.V Development of Parallel Block Multistage Scheme of Dimension Reduction for Globalizer Lite Parallel Software System Anna Zhbanova are presented In Conclusions section, the conclusions and plans of further investigations are formulated Statement of Multidimensional Global Optimization Problem Let us consider a multidimensional multiextremal optimization problem: ሺͳሻ ɔሺ›ሻ ՜ ‹ˆǡ › ‫ ୒ ؿ  א‬ǡ ሺʹሻ  ൌ ሼ› ‫ ୒ א‬ǣ ƒ୧ ൑ ›୧ ൑ „୧ ǡ ͳ ൑ ‹ ൑ ሽǤ i.e a problem of finding the extremum values of the objective (minimized) function ɔሺ›ሻ in a domain  defined by the coordinate limitations (2) on the choice of feasible points› ൌ ሺ›ଵ ǡ ›ଶ ǡ ǥ ǡ ›୒ ሻ If › ‫ כ‬is an exact solution of problem (1) – (2), the numerical solution of the problem considered is reduced to building an estimate of the exact solution › ଴ matching to some notion of nearness to a point (for example, ԡ› ‫ כ‬െ › ଴ ԡ ൑ ɂ where ɂ ൐ Ͳ is a predefined precision) based on a finite number  of computations of the optimized function values Regarding to the class of problems considered, the fulfillment of the following important conditions is presumed: The optimized function ɔሺ›ሻ can be defined by some algorithm for the computation of its values in the points of the domain D The computation of the function value in every point is a computation-costly operation Function φሺ›ሻ satisfy Lipschitz condition: ȁɔሺ›ଵ ሻ െ ɔሺ›ଶ ሻȁ ൑ ԡ›ଵ െ ›ଶ ԡǡ ™Š‡”‡›ଵ ǡ ›ଶ ‫ א‬ǡ Ͳ ൏  ൏ λǡ ሺ͵ሻ that corresponds to a limited variation of the function value at limited variation of the argument The multiextremal optimization problems i.e the problems, which the objective function ɔሺ›ሻ has several local extrema in the feasible domain in, are the subjects of consideration in the present paper The dimensionality affects the difficulty of solving such problems considerably For multiextremal problems so called "curse of dimensionality" consisting in an exponential increase of the computational costs with increasing dimensionality takes place The Approach for Solving the Multidimensional Global Optimization Problems 3.1 Methods of Dimension Reduction One of the approaches to solving the multidimensional global optimization problems consists in the reduction of those to the one-dimensional ones and using the efficient one-dimensional global search algorithms for solving the reduced problems At that, the reduction can be applied to domain ‫ܦ‬ (2) mapping the hyperparallelepiped D onto the interval [0,1] mutually uniquely as well as to the function ߮ሺ‫ݕ‬ሻ, the minimization of which can be performed on the base of the recursive scheme (Sergeyev, Grishagin, 1994): ‹ሼ߮ሺ‫ݕ‬ሻǣ ‫ܦ א ݕ‬ሽ ൌ ‹ ‹ ǥ ‹ ߮ሺ‫ݕ‬ሻǤ ሺͶሻ ௔భ ஸ௬భ ஸ௕భ ௔మ ஸ௬మ ஸ௕మ ௔ಿ ஸ௬ಿ ஸ௕ಿ For the multistage scheme of dimension reduction represented by relation (4), a generalization has been proposed (Barkalov, Gergel, 2014) – a block multistage scheme of dimension reduction, which reduces the solving of initial multidimensional optimization problem (1) – (2) to the solving of a sequence of «nested» problems of less dimensionality 19 Development of Parallel Block Multistage Scheme of Dimension Reduction for Globalizer Lite Parallel Software System Anna Zhbanova Thus, initial vector › is represented as a vector of the «aggregated» macro-variables › ൌ ሺ›ଵ ǡ ›ଶ ǡ ǥ ǡ ›୒ ሻ ൌ ሺ—ଵ ǡ —ଶ ǡ ǥ ǡ —୑ ሻ ሺͷሻ where the ith macro-variable —୧ is a vector of the dimensionality ୧ from the components of vector › taken sequentially i e ‫ݑ‬ଵ ൌ ൫‫ݕ‬ଵ ǡ ‫ݕ‬ଶ ǡ ǥ ǡ ‫ݕ‬ேభ ൯ǡ ‫ݑ‬ଶ ൌ ൫‫ݕ‬ேభାଵ ǡ ‫ݕ‬ேభାଶ ǡ ǥ ǡ ‫ݕ‬ேభ ାேమ ൯ǡ ሺ͸ሻ ௜ିଵ ‫ݑ‬௜ ൌ ൫‫ݕ‬௣ାଵ ǡ ǥ ǡ ‫ݕ‬௣ାே೔ ൯™Š‡”‡‫ ݌‬ൌ ෍ ܰ௞ ǡ ௞ୀଵ At that, σ୑ ୩ୀଵ ୩ ൌ Ǥ Using the macro-variables, main relation of the multistage scheme (3) can be rewritten in the form ‹ ɔሺ›ሻ ൌ ‹ ‹ ǥ ‹ ɔሺ›ሻǡ ሺ͹ሻ ୷‫א‬ୈ ୳భ ‫א‬ୈభ ୳మ ‫א‬ୈమ ୳౉ ‫א‬ୈ౉ where the subdomains ୧ ǡ ͳ ൑ ‹ ൑  are the projections of the initial search domain  onto the subspaces corresponding to the macro-variables —୧ ǡ ͳ ൑ ‹ ൑  The fact, that the nested subproblems ɔ୧ ሺ—ଵ ǡ ǥ ǡ —୧ ሻ ൌ ‹ ɔ୧ାଵ ሺ—ଵ ǡ ǥ ǡ —୧ ǡ —୧ାଵ ሻǡ ͳ ൑ ‹ ൑ ǡ ሺͺሻ ୳౟శభ ‫א‬ୈ౟శభ are the multidimensional ones in the block multistage scheme is the principal difference from the initial scheme Thus, this approach can be combined with the reduction of the domain ‫( ܦ‬for example, with the evolvent based on Peano curve) for the possibility to use the efficient methods of solving the one-dimensional problems of the multiextremal programming (Strongin, 2013) The Peano curve ‫ݕ‬ሺ‫ݔ‬ሻ lets map the interval of the real axis [0,1] onto the domain D uniquely: ሼ‫ ୒ ؿ  א ݕ‬ሽ ൌ ሼ‫ݕ‬ሺ‫ݔ‬ሻǣ Ͳ ൑ ‫ ݔ‬൑ ͳሽǤ ሺͻሻ The evolvent is the approximation to Peano curve with the precision of the order ʹି୫ where ݉ is the density of the evolvent Application the mappings of this kind allows reducing multidimensional problem (1) – (2) to a one-dimensional one ߮ሺ‫ כ ݕ‬ሻ ൌ ߮ሺ‫ݕ‬ሺ‫ כ ݔ‬ሻሻ ൌ ݉݅݊ሼ߮ሺ‫ݕ‬ሺ‫ݔ‬ሻሻǣ ‫߳ݔ‬ሾͲǡͳሿሽǤ ሺͳͲሻ 3.2 Method for Solving the Reduced Global Optimization Problems Multidimensional Algorithm of Global Search (MAGS) established the basis for the methods applied in Globalizer Lite The general computational scheme of MAGS can be presented as follows – see (Strongin, R.G., 1978), (Strongin, R.G., Sergeyev, Ya.D., 2000) Let us introduce a simpler notation for the problem being solved ሺͳͳሻ ˆሺšሻ ൌ ɔ൫›ሺšሻ൯ǣ š ‫ א‬ሾͲǡͳሿǤ Let us assume  ൐ ͳ iterations of the methods to be completed (as the point of the first trial šଵ an arbitrary point of the interval [a; b], for example, the middle of the interval is selected) Then, at the ሺ ൅ ͳሻth iteration, the next trial point is selected according to the following rules Rule To renumber the points of the preceding trials šଵ ǡ ǥ ǡ š ୬  (including the boundary points of the interval [a; b]) by the lover indices in the order of increasing values of the coordinates, Ͳ ൌ š଴ ൏ šଵ ൏ ‫ ڮ‬൏ š୧ ൏ ‫ ڮ‬൏ š୩ ൏ š୩ାଵ ൌ ͳ ሺͳʹሻ The function values œ୧ ൌ ɔሺš୧ ሻ have been calculated in all points š୧ ǡ ‹ ൌ ͳǡ Ǥ Ǥ  In the points š଴ ൌ Ͳ and š୩ାଵ ൌ ͳ the function values has not been computed Let us introduce the fictitious designations œ଴ and œ୩ାଵ for convenience of further explanation Rule To compute the quantities: ȁ‫ݖ‬௜ െ ‫ݖ‬௜ିଵ ȁ ‫ߤݎ‬ǡ ߤ൐Ͳ ሺͳ͵ሻ ߤ ൌ ݉ܽ‫ݔ‬ ǡ ‫ܯ‬ൌ൜ ǡ ͳǡߤ ൌ Ͳ ଵஸ௜ஸ௞ ߂௜ where ” ൐ ͳ is the reliability parameter of the method (specified by the user), ȟ୧ ൌ  š୧ െ š୧ିଵ 20 Development of Parallel Block Multistage Scheme of Dimension Reduction for Globalizer Lite Parallel Software System Anna Zhbanova Rule To compute the characteristics for all intervals ሺš୧ିଵ Ǣ š୧ ሻǡ ͳ ൏ ‹ ൏  ൅ ͳ according to the formulae: ‫ݖ‬ଵ ‫ݖ‬௞ ܴሺͳሻ ൌ ʹ߂ଵ െ Ͷ Ǣ ܴሺ݇ ൅ ͳሻ ൌ ʹ߂௞ାଵ െ Ͷ Ǣ ‫ܯ‬ ‫ܯ‬ ሺͳͶሻ ሺ‫ݖ‬௜ െ ‫ݖ‬௜ିଵ ሻଶ ‫ݖ‬௜ ൅ ‫ݖ‬௜ିଵ ܴሺ݅ሻ ൌ  ߂௜ ൅ െʹ ǡ ͳ ൏ ݅ ൏ ݇ ൅ ͳǤ ଶ ‫߂ ܯ‬௜ ‫ܯ‬ Rule To select the interval with the highest characteristic Let us denote the index of this interval as – Rule To execute the next trial in the point ‫ݔ‬௧ ൅  ‫ݔ‬௧ିଵ ‫ۓ‬ ǡ ‫ א ݐ‬ሼͳǡ ݇ ൅ ͳሽǡ ۖ ʹ ୒ ‫ݔ‬௞ାଵ ൌ ሺͳͷሻ ‫ݔ۔‬௧ ൅  ‫ݔ‬௧ିଵ െ •‹‰ሺ‫ ݖ‬െ ‫ ݖ‬ሻ ͳ ቈȁ‫ݖ‬௧ െ ‫ݖ‬௧ିଵ ȁ቉ ǡ ͳ ൏ ‫ ݐ‬൏ ݇ ൅ ͳǤ ௧ ௧ିଵ ۖ  ʹ ʹ” ‫ە‬ The algorithm is terminated if the condition ȟ୲ ൑ ɂ is satisfied The values ɔ‫כ‬୩ ൌ ‹ ɔሺš ୧ ሻǡš୩‫ כ‬ൌ ƒ”‰ ‹ ɔሺš ୧ ሻǤ ሺͳ͸ሻ ଵஸ୧ஸ୩ ଵஸ୧ஸ୩ are selected as the estimates of the globally optimized solution of problem (1) – (2) The computational scheme of the Parallel Multidimensional Algorithm of Global Search (PMAGS) is practically identical to the MAGS scheme – the differences consist in the following set of rules Rule 4’ To arrange the characteristics of the intervals obtained according to (14) in deceasing order ሺ–ଵ ሻ ൒ ሺ– ଶ ሻ ൒ ‫ ڮ‬൒ ሺ– ୩ ሻ ൒ ሺ– ୩ାଵ ሻ ሺͳ͹ሻ and to select p intervals with the highest values of characteristics (p is the number of processors/cores used for the parallel computations) Rule 5’ To execute new trials at the points ‫ݔ‬௧ೕ ൅  ‫ݔ‬௧ೕିଵ ‫ۓ‬ ǡ ‫ݐ‬௝ ‫ א‬ሼͳǡ ݇ ൅ ͳሽǡ ʹ ۖ ୒ ‫ݔ‬௞ା௝ ൌ ‫ ݔ‬൅  ‫ݔ‬ ሺͳͺሻ ቚ‫ݖ‬ െ ‫ݖ‬ ቚ ௧ ௧ ିଵ ͳ ௧ ௧ ିଵ ೕ ೕ ೕ ೕ ‫۔‬ െ •‹‰ሺ‫ݖ‬௧ೕ െ ‫ݖ‬௧ೕ ିଵ ሻ ቎ ቏ ǡ ͳ ൏ ‫ݐ‬௝ ൏ ݇ ൅ ͳǤ ۖ ʹ ʹ”  ‫ە‬ General Description of Globalizer Lite 4.1 Implementation of Parallel Algorithm of Global Optimization Let us consider a parallel implementation of the block multistage dimension reduction scheme described in Subsection 3.1 For the description of the parallelism in the multistage scheme, let us introduce a vector of parallelization degrees Ɏ ൌ ሺɎଵ ǡ Ɏଶ ǡ ǥ ǡ Ɏ୑ ሻǡ ሺͳͻሻ where Ɏ୧ ǡ ͳ ൑ ‹ ൑ ǡ is the number of the subproblems of the ሺ‹ ൅ ͳሻth nesting level being solved in parallel, arising as a result of execution of the parallel iterations at the ݅ th level For the macro-variable —୧ , the number Ɏ୧ means the number of parallel trials in the course of minimization of the function ɔ୑ ሺ—ଵ ǡ ǥ ǡ —୑ ሻ ൌ ɔሺ›ଵ ǡ ǥ ǡ ›୒ ሻ with respect to —୧ at fixed values of —ଵ ǡ —ଶ ǡ ǥ ǡ —୧ିଵ i e the number of the values of the objective function Mሺ›ሻ computed in parallel 21 Development of Parallel Block Multistage Scheme of Dimension Reduction for Globalizer Lite Parallel Software System Anna Zhbanova In the general case, the quantities Ɏ୧ ǡ ͳ ൑ ‹ ൑  can depend on various parameters and can vary in the course of optimization, but we will limit ourselves to the case when all components of vector Ɏ are constant Thus, a tree of MPI-processes is built in the course of solving the problem At every nesting level (every level of the tree) the parallel implementation of AGS described in Subsection 3.2 is used Let us remind that the parallelization is implemented by selection not a single point for the next trial (as in the sequential version) but ’ points, which are placed into ’ intervals with the highest characteristics Therefore, if ’ processors are available, ’ trials can be executed in these points in parallel At that, the solving of the problem at the ith level of the tree generates the subproblems for the ሺ݅ ൅ ͳሻth level This scheme corresponds to such a method of organization of the parallel computations as «master-slave» When launching the software, the user specifies: x x x A number of levels of subdivision of the initial problem (in other words, the number of levels in the tree of processes) M; A number of variables (dimensions) at each level ( σ୑ ୩ୀଵ ୩ ൌ  where  is the dimensionality of the problem); A number of the MPI-processes and the distribution of these ones among the levels (Ɏ ൌ ሺɎଵ ǡ Ɏଶ ǡ ǥ ǡ Ɏ୑ ሻ) Let us consider an example:  ൌ ͳͲǡ  ൌ ͵ǡ ଵ ൌ ͵ǡ ଶ ൌ Ͷǡ ଷ ൌ ͵ǡ Ɏ ൌ ሺʹǡ ͵ǡ Ͳሻ Therefore, we have MPI-processes, which are arranged into a tree (Fig.1: at every function ɔ୧ varied parameters are shown only, the fixed values are not shown in the figure) According to ଵ ǡ ଶ ǡ ଷ  we have the following macro-variables: —ଵ ൌ ሺ›ଵ ǡ ›ଶ ǡ ›ଷ ሻǡ —ଶ ൌ ሺ›ସ ǡ ›ହ ǡ ›଺ ǡ ›଻ ሻǡ —ଷ ൌ ሺ›଼ ǡ ›ଽ ǡ ›ଵ଴ ሻǤ Each node solves a problem from relation (10) The root (level #0) solves the problem with respect to the first ଵ variables of the initial N-dimensional problem The iteration generates a problem of the next level in any point The nodes of level #1 solve the problems with respect to ଶ variables at fixed values of the first ଵ variables, etc ߮ଵ ሺ‫ݑ‬ଵ ሻ ߮ଶ ሺ‫ݑ‬ଶ ሻ ߮ଷ ሺ‫ݑ‬ଷ ሻ ߮ଷ ሺ‫ݑ‬ଷ ሻ ߮ଶ ሺ‫ݑ‬ଶ ሻ ߮ଷ ሺ‫ݑ‬ଷ ሻ ߮ଷ ሺ‫ݑ‬ଷ ሻ ߮ଷ ሺ‫ݑ‬ଷ ሻ ߮ଷ ሺ‫ݑ‬ଷ ሻ Figure 1: Scheme of organization of parallel computations 4.2 Synchronous Scheme of Computations Let ’ is the number of the child processes of given node of the tree of MPI-processes At every iteration, we select ’ points of the next trials The data is transferred to the child processes We wait until all child processes are completed and receive the solutions of the subproblems from these ones Then, we compute the current estimate of the extremum and select the points for the next trials The main disadvantage of the synchronous scheme is that the child processes of some node, having completed their trials within the iterations before the others, would stay idle waiting for the full 22 Development of Parallel Block Multistage Scheme of Dimension Reduction for Globalizer Lite Parallel Software System Anna Zhbanova completing of the iteration The asynchronous scheme considered below is free from this disadvantage 4.3 Asynchronous Scheme of Computations At the first iteration, we proceed in the same way, as in the synchronous scheme: we generate the trial points according to the number of the child processes, send the scheduled trial point to each child process, and wait until all child processes complete their subproblems At the second iteration, we also send a single point to each child process This is because for the correct functioning of the asynchronous scheme, enough number of intervals generated in the course of solving should be available In some cases, the first iteration is not enough to ensure this, therefore, the second iteration is executed separately from all the next ones as well, according to pure synchronous scheme At that, within the second iteration, the transition to the asynchronous mode of performing takes place, and we wait until one child process is completed only, not waiting for the completing of the rest, and receive the data from this process At the next iteration, we select only one trial point and send it to the child process, which a solution has been received from at the preceding iteration Then, we wait until any child process completes its own subproblem and receive the data from this one These operations are repeated until the required precision of solution is reached 4.4 Globalizer Lite System Architecture The following basic classes are represented in the system (Fig 2): TProcess – manager class responsible for the creation of all objects of the system TMethod – class, including the method of solving Performs the iterations of the index method taking into account the level in the process tree TTask – class encapsulates the optimization problem TSearchData – class encapsulates the data accumulating and processing TEvolvent, TRotatedEvolvents – classes implement the mapping (9) using Peano curve TPriorityQueue – stores the intervals with the best characteristics Subject to the scheme selected by the user (either synchronous or asynchronous one), an object of class TProcess (or TAsyncProcess) is created for each MPI-process, which all information on the position of the process in the tree (the level of the tree, parent process, child processes); the problem being solved; and the method (the object of the class TMethod), which the optimum is being found by is stored in Each process, subject to its position in the tree, solves its own optimization problem using one of the schemes described in the subsections 4.2 and 4.3 So parallelism and distribution of data between processes are performed out within the class Process 23 Development of Parallel Block Multistage Scheme of Dimension Reduction for Globalizer Lite Parallel Software System Anna Zhbanova Figure 2: Scheme of the classes Results of Numerical Experiments In this section, the results of experiments are presented The first series of experiments was directed onto the comparing of the synchronous and asynchronous schemes of the block multistage reduction, comparing various distributions of the dimensions in the levels of the tree of MPIprocesses, and finding the most efficient concept of the problem data distribution among the levels Rastrigin function (20), Rosenbrock function (21) and Neumeier function (22) were used for the experiments ௡ ߮ሺ‫ݕ‬ሻ ൌ  ‫ܣ‬௡ ൅ ෍ሾ‫ݕ‬௜ଶ െ ‫•‘…ܣ‬ሺʹߨ‫ݕ‬௜ ሻሿǡ ‫ ܣ‬ൌ ͳͲǡ ‫ݕ‬௜ ߳ሾെͷǤͳʹǢ ͷǤͳʹሿ ሺʹͲሻ ௜ୀଵ ௡ିଵ ߮ሺ‫ݕ‬ሻ ൌ  ෍ሾሺͳ െ ‫ݕ‬௜ ሻଶ ൅ ͳͲͲሺ‫ݕ‬௜ାଵ െ ‫ݕ‬௜ଶ ሻଶ ሿǡ ௜ୀଵ ௡ ሺʹͳሻ ௡ ߮ሺ‫ݕ‬ሻ ൌ  ෍ ሺܾ௞ െ ෍ ‫ݕ‬௜௞ାଵ ሻଶ ǡ ௞ୀଵ ‫୒ א ݕ‬ ܾሾͶሿ ൌ ሼͺǢ ͳͺǢ ͶͶǢ ͳͳͶሽ ሺʹʹሻ ௜ୀଵ The experiments were performed on the UNN computing cluster (40 compute nodes, dual-core Intel Xeon 3.2 GHz processors each, 4Gb RAM) Dimension Sync scheme, time (sec) (1, 3) (2, 2) (3, 1) 5.874009 0.156196 0.068943 Async scheme, Sync scheme, time (sec) number of trials of root process 5.818838 33 0.041423 522 0.006823 3429 Async scheme, number of trials of root process 29 144 307 Table 1: The results of solving the test problems– Rastrigin function (4 – dimensional) 24 Speedup 1.009481 3.770755 10.1045 Development of Parallel Block Multistage Scheme of Dimension Reduction for Globalizer Lite Parallel Software System Dimension Sync scheme, time (sec) (1, 3) (2, 2) (3, 1) 68.473302 12.831859 1.390790 Async scheme, Sync scheme, time (sec) number of trials of root process 69.707579 69 7.235705 2691 0.535190 88428 Async scheme, number of trials of root process 70 1931 48930 Anna Zhbanova Speedup 0.982294 1.773408 2.598685 Table 2: The results of solving the test problems– Rosenbrock function (4 – dimensional) Dimension Sync scheme, time (sec) (1, 3) (2, 2) (3, 1) 9.053919 0.240824 0.060880 Async scheme, Sync scheme, time (sec) number of trials of root process 8.444825 39 0.162694 396 0.032426 3705 Async scheme, number of trials of root process 35 258 2274 Speedup 1,072126 1,480227 1,877506 Table 3: The results of solving the test problems– Neumeier function (4 – dimensional) From the results of the experiments conducted, one can see that the asynchronous version works faster in most cases In average, the speedup 1.5 – times was achieved Also, when comparing the schemes one can note that the root-level process (the root of the tree) in the asynchronous scheme performs less number of iterations than in the synchronous one The total computation time of the algorithm depends on the problem being solved strongly Comparing the data obtained on the three test functions, one can conclude that when the distribution of the dimensions among the levels of the tree, at equal number of processes it is the most profitable to leave the largest dimension at the zero level Although at that the root performs more operation than in other cases, the total time of solving the problem is reduces considerably So far, the distribution of the dimensions among the levels is an important factor, determining the computation time of the algorithm Also it is worth noting that increasing the number of levels in the tree results in a desired speed up of the algorithm not all times This originates from the increasing amount of data transmitted between the processes However, nevertheless, if the objective function itself is computed in a long enough time, such a variant of building a tree is justified The next series of experiments was directed onto comparing various structures of the MPI-process trees and on determining the effect of the tree structure on the rate of the problem solving According to the data obtained as a result of experiment, one can point out that (4, 2, 0) is the most profitable structure for the synchronous scheme This can be explained by the fact that for the functions considered here, which are computed relatively quickly, two child processes at the second level, which would compute the function directly, are well enough In this connection, the structures (1, 11, 0) and (2, 5, 0) are less efficient The structures, having four levels are not efficient for Rastrigin function since the time costs of the data transfer appear to exceed the ones of computing the function itself essentially 25 Development of Parallel Block Multistage Scheme of Dimension Reduction for Globalizer Lite Parallel Software System Number of child processes at each level (1, 11, 0) (2, 5, 0) (3, 3, 0) (4, 2, 0) (6, 1, 0) (3, 1, 2, 0) (2, 1, 4, 0) Sync scheme, time (sec) Async scheme, time (sec) 3.269679 3.024346 7.082422 1.788619 2.835875 22.036564 75.340692 1.368556 1.478795 1.698054 2.552029 1.517050 9.449429 7.429606 Anna Zhbanova Sync scheme, Async scheme, number of trials of number of trials of root process root process 180 123 266 131 552 150 156 214 228 136 552 173 266 93 Table 4: The results of solving the test problems – Rastrigin function (5 – dimensional) This research was supported by the Russian Science Foundation, project No 16-11-10150 “Novel efficient methods and software tools for the time consuming decision making problems with using supercomputers of superior performance” Conclusion The present work is devoted to the investigation of the possibility to speed up the process of searching the global optimum when solving the multidimensional multiextremal optimization problems using the approach based on the application of the parallel block multistage scheme of the dimension reduction The general principles of the organization of the synchronous and asynchronous parallel computations have been considered The implementation of the above schemes has been done using MPI For the proposed block multistage scheme, the issues of the efficiency of its application to the problems with various times of computation of the optimized function have been studied In future, Globalizer Lite system is planned to be used for solving the applied problems References Strongin, R.G (1978) Numerical Methods in Multiextremal Problems (Information-Statistical Algorithms) Moscow: Nauka In Russian Strongin, R.G., Sergeyev, Ya.D (2000) Global Optimization with Non-convex Constraints: Sequential and Parallel Algorithms Kluwer Academic Publishers, Dordrecht Strongin, R.G (2013) Parallel Computing in the Global Optimization Problems: Monograph / R.G Strongin, V.P Gergel, V.A Grishagin, K.A Barkalov – Moscow: Moscow University Publishing, 2013 – 280 p., ill – (Series «Supercomputer Education») In Russian Gergel, V., et al (2013) High Performance Computing in Biomedical Applications // Procedia Computer Science, 18, 10-19 Gergel, V., Grishagin, V., Israfilov, R (2015) Local Tuning in Nested Scheme of Global Optimization // Procedia Computer Science 51, 865–874 Barkalov, K., Gergel V (2015) Parallel global optimization on GPU // Journal of Global Optimization 1-18, DOI: 10.1007/s10898-016-0411-y Sergeyev, Y.D., Grishagin, V.A (1994) Sequential and parallel algorithms for global optimization Optimization Methods and Software, 3, 111-124 Web Site jMetal 5, 2016 Retrieved from http://jmetal.github.io/jMetal/ Web Site PaGMO, 2016 Retrieved from http://esa.github.io/pagmo/index.html Web Site DEAP, 2016 Retrieved from http://deap.readthedocs.io/en/master/ 26 ... (4 – dimensional) 24 Speedup 1.009481 3.770755 10.1045 Development of Parallel Block Multistage Scheme of Dimension Reduction for Globalizer Lite Parallel Software System Dimension Sync scheme, ... iterations before the others, would stay idle waiting for the full 22 Development of Parallel Block Multistage Scheme of Dimension Reduction for Globalizer Lite Parallel Software System Anna Zhbanova... objective function Mሺ›ሻ computed in parallel 21 Development of Parallel Block Multistage Scheme of Dimension Reduction for Globalizer Lite Parallel Software System Anna Zhbanova In the general

Ngày đăng: 24/11/2022, 17:53

Xem thêm: