hh hh hh hh hh hh hh Algorithms No. Workflows 20 40 60 80 100 MM-HEFT 57.77 101.08 141.15 177.65 222.98 HEFT 55.8 145 295 400 527.5 Hybrid.BMCT 56.3 145 295 400 527.5
Bảng 4.3: Thời gian hoàn thành thực thi trung bình của các workflow (average makespan). h hh hh hh hh hh hh hh h Algorithms No. Workflows 20 40 60 80 100 MM-HEFT 683 1890 5830 11178 17572 HEFT 401 2430 13357 26183 49596 Hybrid.BMCT 456 3037 39263 62475 126762
Bảng 4.4: Thời gian chạy trung bình của các giải thuật (average running time).
h hh hh hh hh hh hh hh h Algorithms No. Workflows 20 40 60 80 100 MM-HEFT 452.6 895.55 1220.4 1582.55 2068.8 HEFT 55.8 145 295 400 527.5 Hybrid.BMCT 56.3 145 295 400 527.5
Bảng 4.5: Thời gian thực thi tuần tự trung bình của các workflow trên một cụm tài nguyên với tổng thời gian thực thi là nhỏ nhất (average sequential makespan).
hh hh hh hh hh hh hh hh Algorithms No. Workflows 20 40 60 80 100 MM-HEFT 7.83 8.86 8.65 8.91 9.28 HEFT 1 1 1 1 1 Hybrid.BMCT 1 1 1 1 1
Bảng 4.6: Tỷ số trung bình giữa thời gian thực thi tuần tự và thời gian thực thi song song (average speedup).
Hình 4.3: Biểu đồ kết quả thời gian hoàn thành thực thi trung bình của các workflow.
Hình 4.4: Biểu đồ kết quả thời gian chạy trung bình của giải thuật.
Hình 4.5: Biểu đồ kết quả tỷ số trung bình giữa thời gian thực thi tuần tự và thời gian thực thi song song.
Chương 5
TỔNG KẾT
5.1 Đóng góp
Trong phạm vi nghiên cứu, đề tài tập trung giải quyết bài toán phân bổ công việc-tài nguyên tắnh toán ảo được cấp phát bởi cloud cá nhân. Nhiệm vụ của đề tài là xây dựng mô hình, giải thuật, chiến lược lập lịch phân phối nhằm tận dụng tốt tài nguyên được cung cấp sao cho việc thực thi các công việc của workflow được thực thi hoàn thành trong thời gian sớm nhất có thể. Những đóng góp cụ thể của đề tài nghiên cứu như sau:
Ễ Xây dựng hai mô hình toán học cho bài toán lập lịch workflow tối ưu trên tài nguyên cloud bằng công thức qui hoạch tuyến tắnh nguyên.
Ễ Độ hiệu quả giữa hai mô hình qui hoạch tuyến tắnh nguyên được phân tắch và so sánh thông qua một số kết quả chạy mô phỏng từ solver.
Ễ Đặt ra bài toán lập lịch đa workflow trên nhiều cụm tài nguyên phân bố trên cloud có sự hạn chế về tốc độ truyền dẫn và đề xuất giải thuật heuristic MM-HEFT để giải bài toán này.
Ễ Phân tắch, đánh giá độ hiệu quả của giải thuật MM-HEFT đã đề xuất với các phương pháp tiếp cận của các công trình liên quan khác.
5.2 Hướng phát triển
Thực tế, tài nguyên trên cloud cá nhân không chỉ cấp phát dành riêng cho nhu cầu thực thi các ứng dụng workflow, mà còn phục vụ đa dạng nhu cầu tắnh toán và sử dụng khác. Trong giai đoạn cao điểm, cloud cá nhân không đủ khả năng để đảm bảo đáp ứng nhu cầu về tài nguyên cho người dùng cloud, do vậy việc thuê tài nguyên bên ngoài (external resources) từ nhiều nhà cung cấp dịch vụ cloud khác (public cloud) là một giải pháp để giải quyết các yêu cầu này của người dùng.
Xét ở góc độ nhà đầu tư, sẽ đòi hỏi mô hình tắnh toán chi phắ thuê tài nguyên ảo và câu hỏi được đặt ra: "Liệu có tồn tại một cách thuê tài nguyên từ nhiều nhà cung cấp dịch vụ cloud cho việc thực thi nhiều ứng dụng workflow, sao cho tổng chi phắ phải trả không vượt quá một giá trị y được đầu tư và vẫn đảm bảo đáp ứng được yêu cầu về thời hạn thực thiz của từng workflow?". Vấn đề về tốc độ truyền dẫn dữ liệu [3] cũng phải được xét đến khi giải quyết bài toán này, các tập tin dữ liệu của workflow khoa học có kắch thước rất lớn, không hiệu quả nếu di chuyển dữ liệu giữa các nhà cung cấp dịch vụ cloud khi thực thi một workflow vì sự di chuyển dữ liệu ra/vào giữa các nhà cung cấp cloud sẽ phải trả chi phắ cao và tốn rất nhiều thời gian.
Tài liệu tham khảo
[1] Y. Gil, E. Deelman et al., ỀExamining the Challenges of Scientific Workflows,Ể IEEE Computer, vol.40, no.12, pp. 24Ố32, 2007.
[2] S. Bharathi, A. Chervenak, E. Deelman, G. Mehta, M.-H. Su, and K. Vahi, ỀCharacterization of scientific workflows,ỂThe 3rd Workshop on Work- flows in Support of Large Scale Science, pp. 1Ố10, 2008.
[3] E. Deelman and A. Chervenak, ỀData Management Challenges of Data-Intensive Scientific Workflows,Ể IEEE International Symposium on Cluster Computing and the Grid (CCGrid2008), pp. 687Ố692, 2008. [4] Saeid Abrishami, Mahmoud Naghibzadeh, Dick H.J. Epema, ỀDeadline-
constrained workflow scheduling algorithms for Infrastructure as a Service Clouds,Ể Future Generation Computer System, pp. 158Ố169, 2013. [5] Thiago A.L. Genez, Luiz F.Bittencourt, Edmundo R.M.Madeira, ỀWork- flow Scheduling for SaaS/PaaS Cloud providers Considering Two SLA Levels,Ể IEEE/IFIP Network operations and Management Symposium (NOMS), pp. 906Ố912, 2012.
[6] Suraj Pandey, Linlin Wu, Siddeswara M. Guru, Rajkumar Buyya, ỀA Par- ticle Swarm Optimization-based Heuristic for Scheduling Work- flow Applications in Cloud Computing Environments,Ể IEEE Inter- national Conference on Advanced Information Networking and Applications, pp. 400Ố407, 2010.
[7] Dong Yuan, ỀAchieving the Best Trade-Off between Computation and Storage in the Cloud,Ể PhD thesis, Swinburne University of Tech- nology, December 2011.
[8] Ke Liu, ỀScheduling Algorithms for Instance-Intensive Cloud Work- flows,Ể PhD thesis, Swinburne University of Technology, June 2009.
[9] U. Honig and W. Schiffmann, ỀA Meta-algorithm for Scheduling Mul- tiple DAGs in Homogeneous System Environments, Ể In Proceedings of the 18th International Conference on Parallel and Distributed Computing and Systems (PDCS’06). IEEE, 2006.
[10] H. Zhao and R. Sakellariou, ỀScheduling Multiple DAGs onto Hetero- geneous Systems, Ể In Proceedings of the 15th Heterogeneous Computing Workshop (HCW), April 2006.
[11] H. Topcuoglu, S. Hariri, and M. Y. Wu, ỀPerformance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing, Ể
IEEE Transactions on Parallel and Distributed Systems, pp. 260Ố274, March 2002.
[12] R. Sakellariou and H. Zhao, ỀA Hybrid Heuristic forDAG Scheduling on Heterogeneous Systems, Ể Proceeding of 13th Heterogeneous Comput- ing Workshop (HCW 2004), pp. 253Ố262, 2004
[13] R. Prodan, T. Fahringer, ỀOverhead Analysis of Scientific workflows in Grid Environments, Ể IEEE Transactions on Parallel and Distributed Systems 19 (2008), pp 378Ố393, 2009
[14] M. Maheswaran, S. Ali, H. J Siegel, D. Hensgen, and R. F. Freund, ỀDy- namic Mapping of a Class of Independent Tasks onto Heteroge- neous Computing Systems, Ể Journal of Parallel and Distributed Com- puting, pp 107Ố131, 1999.
[15] T. D. Braun, H. J. Siegel and N. Beck, ỀA Comparison of Eleven Static Heuristics for Mapping a Class of Independent Tasks onto Hetero-
geneous Distributed Computing Systems, Ể Journal of Parallel and Distributed Computing, pp 810Ố837, 2001.
[16] Pham Hoang Vu DONG, Tuan TRAN PHUONG, Thanh Van LE, Nguyen HUYNH TUONG, Van Hoai TRAN, ỀScheduling Workflow Applica- tions in Private Cloud Considering Network Bandwidth, Ể In Pro- ceedings of International Conference on Advanced Computing and Applica- tions (ACOMP 2013), pp 20Ố29, 2013.
[17] J. Yu and R. Buyya, ỀWorkflow Schdeduling Algorithms for Grid Computing, Ể Technical Report, GRIDS-TR-2007-10, Grid Computing and Distributed Systems Laboratory, 2007.
[18] F. Dong and S. G. Kkl, ỀScheduling Algorithms for Grid Computing: State of the Art and Open Problems, Ể Technical Report No. 2006-504, School of Computing, Queen’s University, Kingston, Ontario, Canada, 2006.
Các công trình đã công bố
1. Pham Hoang Vu DONG, Tuan TRAN PHUONG, Thanh Van LE, Nguyen
HUYNH TUONG, Van Hoai TRAN: Scheduling Workflow Applica-
tions in Private Cloud Considering Network Bandwidth. In Pro- ceedings of International Conference on Advanced Computing and Applica- tions (ACOMP 2013), pp 20Ố29, 2013.
Journal of Science and Technology 51 (4B) (2013) 20-29
SCHEDULING WORKFLOW APPLICATIONS
IN PRIVATE CLOUD CONSIDERING NETWORK BANDWIDTH
Pham Hoang Vu DONG*, Tuan TRAN PHUONG, Thanh Van LE, Nguyen HUYNH TUONG, Van Hoai TRAN
Faculty of Computer Science & Engineering, HCMC University of Technology, VNUHCM, Vietnam
*
Email: hoangvudp@gmail.com
Received: 15 July 2013, Accepted for publication: 16 October 2013
ABSTRACT
The emergence of cloud computing offers a new model of service provisioning in distributed systems. This encourages many research activities to investigate its benefitu"kp"twppkpi"uekgpvkÝe" workflow applications. One of the most challenging problems in clouds is workflow scheduling. In this paper, a workflow scheduling problem in private cloud is considered: while transferring computational results between tasks, different network bandwidth among virtual resources will be taken into account. The objective function used to measure the performance of a schedule, in this study, is to minimize the makespan. Two integer linear programs (ILP) to model the optimization problem are proposed. They are solved by a well-known ILP solver and the performance of the two approaches is analyzed and compared through some simulation results.
Keywords: cloud computing, linear programming, virtual machine allocation, workflow
scheduling.
1. INTRODUCTION
Workflow is structured as a common model for describing a wide range of scientific applications in distributed system [1]. These scientific workflows need to process huge amount of data and computationally intensive activities. They are usually presented by a directed acyclic graph (DAG) in which each computational task is denoted by a node, and each data dependency between two tasks is denoted by a directed edge between the corresponding nodes. Because of the importance of workflow applications, many recent researches consider the benefits of using cloud computing for executing scientific workflows [2 - 4]. Some features of cloud computing also meet the requirements of scientific workflow. Cloud computing not only provides as a high performance computing, but also a scalable infrastructure required for scientific application.
By taking advantage of cloud computing, scientific workflow systems could gain a wider utilization. However, they also face some new challenges, where workflow scheduling is one of
Scheduling workflow applications in private cloud considering network bandwidth
them. Workflow scheduling is the problem of mapping each task to suitable resources and ordering the tasks on each resource to satisfy certain performance criteria. The scheduling strategies are developed for different objectives such as minimization of total execution time, minimization of total execution cost, minimization of execution cost while meeting a user- defined deadline, and so forth. In this study, we focus on minimizing the makespan of workflow tasks which are assigned to execute on computational virtual resources provided by private cloud. The bandwidth of connection links between virtual resources has different values. To achieve this, we formulate a scheduler as an ILP model with this objective.
The main contribution of this paper is to propose some ILP models to address the scheduling problem and simulations that evaluate scheduling results from the ILP models. The remainder of the paper is organized as follows. In Section 2, some related works are discussed. In Section 3, some notation used in this paper is introduced and the considered problem is addressed in detail. Then, two linear ILP models describing the considered problem are proposed in section 4. Section 5 presents the simulation experiments and analysis. Section 6 concludes this study and future works are discussed.
2. RELATED WORK
There are few works addressing workflow scheduling on the grid environment. Deelman et al. [5] have done considerable work on planning, mapping and data-reuse in the area of workflow scheduling. They proposed Pegasus [5], which is a framework that maps complex scientific workflows onto distributed resources such as the Grid. Ostermann et al. [6] consider scheduling workflows on a grid environment which is capable of leasing cloud resources, whenever needed.
Abrishami et al. proposed in [7] two workflow scheduling algorithms which are based on the concept of partial critical path (PCP) for the IaaS cloud environment. They aim to create a schedule that minimizes the total execution cost of a workflow while satisfying a user-fgÝpgf" deadline. Although their strategies minimize the monetary cost of the workflow execution within a deadline, they do not consider the bandwidth among computational resources so the transfer time of data dependency between two tasks only depends on the amount of data that needs to be transferred.
In [8], the workflow scheduling problem was formulated as an ILP that considers the leasing of reserved and on-demand resources from multiple IaaS providers according to a two- level SLA. The scheduler can run in either a SaaS or a PaaS cloud provider, and receive workflow execution requests with deadlines from clients, but it can also lease resources from multiple IaaS providers. However, it is normally pqv" rtcevkecn" vq" twp" uekgpvkÝe" crrnkecvkqpu" across different cloud service providers due to certain reasons. First, the data in scientific applications are often very large in size. They are too large to be transferred efficiently due to bandwidth limitations. Second, cloud service providers place high price on data transfer in and out their data centers.
Catalyurek et al. [9] proposed a heuristic that models the workflow as a hypergraph and considers both data placement and task assignment schemes simultaneously. They assume that after a wqtmhnqy"vcum"Ýpkujgu"kvu"gzgewvkqp."vjgtg"ku"qpn{"qpg igpgtcvgf"qwvrwv"Ýng0"Kp"qwt"ecug." we are interested in modeling a more realistic scenario, and therefore, we consider there are ocp{"igpgtcvgf"Ýngu"cpf"vjg{"jcxg"fkhhgtgpv"ukỠgu0
Pham Hoang Vu Dong, Tuan Tran Phuong, Thanh Van Le, Nguyen Huynh Tuong, Van Hoai Tran
A characteristic of the above related work on workflow scheduling in cloud is that it takes into account the pricing cost from public cloud service providers with execution time constraint. A common assumption in workflow scheduling is that data transfer speed is the same for any network link. However, for certain cases, transfer speed is likely to be different for different pairs of nodes. In this paper, we propose mathematical models to address the scheduling problem for such situations. In private cloud, we do not care about bandwidth usage cost and therefore the objective is to achieve optimal completion time. The considered problem is explained in more detail in the next section.
3. PROBLEM STATEMENT
3.1. Terminology, notation
Figure 1. A sample workflow.
Notation used in this paper is given below.
à DAG G=(U,E), each uiệU represents a task, and each edge ei,jệE represents a data
dependency between tasks ui and uj; ei,j is the data produced by task ui, and consumed
by uj. Labels on nodes represent computation costs (computing demands, for instance), yjkng" ncdgnu" qp" gfigu" tgrtgugpvu" eqoowpkecvkqp" equvu" *ukỠg" qh" Ýng" pggfu" vq" dg"
transferred), as shown in the sample of Figure 1;
à n: number of tasks in DAG G (nệN);
à U={u1,Ẩ,un}: the set of tasks of workflow application. A task without any parent is
entry task u1 and a task without any child is called an exit task un; Ã W={w1,Ẩ,wn}: the set of processing demands for each task uiệU;
à V: the set of virtual machines (VMs) (heterogeneous resources) that are allocated by all
physical machines in private cloud. Number of VMs m=|V|;
à fi,j: ukỠg"qh"Ýng"pggfu"vq"dg"vtcpuhgttgf"dgvyggp"vcumu ui and uj; à H(j): the set of immediate predecessors of task ujệU;
à Av: the computing ability of VM vệV, once task ui is assigned to VM v, then ui’s execution time
i u i
P ?wcan be obtained;
à Bv,r: the bandwidth between two VM v,rệV. If the task ui and uj are assigned to different
VMs v and r, respectively, then the transfer time , , / ,
i j
u u i j v r
Q ? f B can be determined.
Note that, task uj cannot start executing until task ui has finished with execution time
i u
Scheduling workflow applications in private cloud considering network bandwidth
and transfer time ,
i j
u u
Q . We assume the transfer time , 0
i j
u u
Q ? once the task ui and uj
are assigned to the same physical machine;
à The formulation of the ILP problem considers discrete time intervals. Let
T={0,Ẩ,Tmax} dg"vjg"vkognkpg"qh"vjg"rquukdng"yqtmẺqy"gzgewvkqp"vkog.
In order to look at the problem in more detail, we formulate it in decision version as follows.
3.2. Decision problem
Decision problem of workflow scheduling is described as follow
Data input: Given a workflow with n tasks u1,Ẩ,un have to be scheduled without
preemption execution on m parallel machines. The data dependency denotes a precedence
constraint which indicate that a successor task only starts its execution after all predecessor tasks Ýpkuj"vjgkt"gzgewvkqpu"cpf"cnn"pggfgf"fcvc"ujqwnf"dg"vtcpuhgttgf"vq"vjg"eqorwvcvkqpcn"tguqwteg" that is assigned to execute this successor task.
Question: Does exist a resource assignment such that the makespan of workflow execution
among all the computational resources is less than a given value y?
4. INTEGER LINEAR MATHEMATICAL MODELS
The construction of mathematical models provides a more comprehensive view of the mathematical aspects of the scheduling problem that we are considering. With two models presented below, we can see how the decision variables affect the performance of a model. The different decision variables will yield different computing performance results.
Before building mathematical models for the problem, we figure out the problem's constraints as follows:
1. A VM v only executes up to one task at a given time t;
2. A task u must be executed only once in a VM v;
3. A successor task z cannot begin its execution until all predecessor tasks have finished
their executions and the result data have arrived on the VM that will be assigned to
execute z;
4. Once a VM is assigned for a task, that task must be executed continuously (without preemptive execution) until completing in that VM;
5. The variables of ILP will only take the binary values 0 or 1.
4.1. Mathematical model 1
The ILP solves the scheduling problem through the binary variables x and y:
à xu,t,v: binary variable that has the value 1 if task uệU Ýnishes at time tệT on VM vệV,
otherwise this variable has the value 0;