Studying and developing a cdn caching algorithm using machine learning

VIETNAM NATIONAL UNIVERSITY HO CHI MINH CITY HO CHI MINH CITY UNIVERSITY OF TECHNOLOGY FALCUTY OF COMPUTER SCIENCE AND ENGINEERING ——————– * ——————— GRADUATION THESIS STUDYING AND DEVELOPING A CDN CACHING ALGORITHM USING MACHINE LEARNING Major: Computer Science Council : Computer Science Instructor: Assoc Prof Thoai Nam Reviewer: Dr Nguyen Le Duy Lai —o0o— Student 1: Tran Trung Quan - 1752044 Student 2: Pham Trong Nhan - 1752394 HO CHI MINH CITY, 08/2021 Acknowledgement We would love to show our deep and honest gratitude to our advisor, Assoc Prof Thoai Nam, for his guidance, advice, enthusiasm, and encouragement in helping us research and implement this study We would like to extend our thanks to two postgraduate students from the High-Performance Computing Lab, Mr Tran Ngoc Anh Tu and Mr La Hoang Loc, for their assistance in directing and implementing this research We would like to thank lecturers in the Faculty of Computer Science and Engineering, Ho Chi Minh City University of Technology, for their enthusiastic transfer of knowledge during the years we studied at the university We are mindful that this project is still incomplete and involves inevitable mistakes To further change, we would love to receive feedback from the lecturers Finally, we wish you health, prosperity, and success on your chosen paths Declaration Content Delivery Network is not a new concern, but are still a challenge due to the growth of digital content and network infrastructure Vietnam currently does not have much in-depth research on this subject There is a lot of knowledge about the study process that is not part of the program at the university level, but we promise that this is our research under the guidance of Assoc Prof Thoai Nam The research content and results are legitimate and have never been published before The data used for analysis and feedback has been collected by me from several different sources and will be indicated in the reference section Besides, we have also used some reviews, evaluations, and figures of other authors and organizations All have citations and annotations We are entirely responsible for the content of our research The Ho Chi Minh City University of Technology is not involved in the copyright infringements caused by our research Contents Introduction 1.1 Overview 1.2 Goal 1.3 Scope 1.4 Thesis Structure Knowledge Base 2.1 Content Delivery Network 2.1.1 Introduction 2.1.2 CDN Taxonomy 2.1.3 Content Placement 2.1.4 Request Routing 2.1.5 Open Issues in CDNs 2.2 A Light-weight Content Distribution Scheme for Cooperative Caching in TelcoCDNs [19] 2.2.1 Approach 2.2.2 Model of Light-weight Content Distribution for Cooperative caching 2.2.3 Color-based caching scheme’s Evaluation 2.3 Color-based Cooperative Cache and its Routing Scheme for Telco-CDNs [24] 2.3.1 Approach 2.3.2 Color Tag Based Cooperative Caching, Color Tags Management Algorithm and Routing Algorithm 2.3.3 Research Result 2.4 Emulation of the color-based caching scheme in Telco-CDNs with Mininet using real data [29] 2.4.1 Analyze CDN real log from SBD Inc - Workload types 2.4.2 Trace-based system analysis 2.4.3 Development CDN Emulation Tool 2.5 Simulated Annealing [32] 2.5.1 Local Search 2.5.2 Basic Simulated Annealing 2.5.3 Mathematical Modeling 2.5.4 Finding minimal traffic time of color-based cooperative caching in TelcoCDNs by using Simulated Annealing algorithm 2.6 Bayesian Optimization 2.6.1 Overview 2.6.2 Introduction to Bayesian Optimization 2.6.3 Model of the function: Gaussian Processes ii 1 2 3 3 10 11 12 12 13 16 20 20 21 24 25 26 27 29 32 32 33 35 36 39 39 40 41 2.6.4 2.6.5 Choice of kernels Acquisition functions 42 43 Proposed Solution 3.1 Analysis: Current Problem with Normal Separator Rank Algorithm 3.1.1 Problems 3.2 Proposed Solution 3.2.1 Finding minimal traffic time of color-based cooperative caching in TelcoCDNs by using Bayesian Optimization 46 46 46 47 Evaluation 4.1 First Phase: Evaluation with Simulated Datasets 4.1.1 Setting 4.1.2 Experiment 4.1.3 Experiment 4.1.4 Experiment 4.1.5 Experiment 4.1.6 Experiment 4.1.7 Summary 4.2 Second Phase: Evaluation with Real Datasets 4.3 Conclusion 53 54 54 55 57 58 59 60 61 61 64 66 66 66 66 66 66 67 Summary 5.1 Achievements 5.1.1 Knowledge about CDN system and related problems 5.1.2 Using Bayesian Optimization to solve the optimization problems 5.1.3 Achievements in experiences, knowledge, and soft skills 5.2 Drawbacks 5.3 Future Improvements 47 List of Tables 2.1 Example of a record in system log 27 3.1 3.2 Sampling set of different increased rank number Number of separator ranks in different sampling sets 52 52 4.1 4.2 Setting for evaluating with Simulated Datasets Initial sampling separator ranks 54 54 iv List of Figures 2.1 2.2 2.3 2.4 2.5 2.6 2.7 2.8 2.9 2.10 2.11 2.12 2.13 2.14 2.15 2.16 2.17 2.18 2.19 2.20 2.21 2.22 2.23 2.24 2.25 2.26 2.27 2.28 2.29 2.30 2.31 2.32 2.33 2.34 CDN Composition CDN Composition - Relationship CDN Composition - Interaction Protocols Content Distribution and Management Surrogate placement Content management subsystem End-user to CDN interaction Utilities for modeling objectives and constraints Taxonomy of request-routing mechanisms Example of contents cached in three servers according to their color tags and popularities [19] Proposed LFU-LRU Hybrid Cache Architecture [19] Request handling algorithm with proposed hybrid caching scheme [19] Server colorization algorithm [19] Popularity class and corresponding tags with four colors [19] Unidirectional ring topology with nodes with their corresponding colorations [19] 2D-mesh topology with two different colorations [19] NTT mesh topology in Japan and its coloration [19] Simulation parameters [19] The number of contents in each popularity class.[19] Configurations of the computing host for GA [19] Comparison of convergence [19] Normalized traffic on a ring-based network with nodes.[19] Normalized traffic on a 2D-mesh network with 25 nodes.[19] Normalized traffic on the NTT network with 55 nodes.[19] Cache hit ratio with a change in the popularity [19] Separator ranks for each gamma parameter when 1000 contents are classified into five popularity classes [24] Iterative calculation for separator ranks algorithm [24] Color-based routing that finds the nearest cache server with the requesting colortag [24] Structure of the color-tag based cache server [24] Color based routing algorithm [24] NTT-like mesh topology and its coloration with four colors [24] Traffic reduction under different routing and caching strategies [24] Video Streaming workflow [29] The proportion of number of requests and content size for Multipurpose Internet Mail Extensions [29] v 6 8 10 13 14 14 15 16 16 17 17 17 18 18 18 19 19 19 20 21 22 23 23 24 25 25 26 28 2.35 2.36 2.37 2.38 2.39 2.40 2.41 Upper: General hit rate for services, Lower: Sum latency for services [29] Components of the tool [29] The tool’s workflow [29] Request-response flow [29] Number of content was requested in each interval in CDN T comparison between Simulated Annealing and Original algorithms T est num run comparison between Simulated Annealing and Original algorithms 29 30 30 31 38 38 Test function of original algorithm with increased rank = Test function of original algorithm with separator rank (Increased rank = 1) Test function of original algorithm with increased rank = 50 51 51 The topology of France CDN network Evaluation Bayesian Optimization (BO) method with original method (Sampling set: ”all-rank”, Increased rank = 1) 4.3 Evaluation different initial sampling points of Bayesian Optimization (Sampling set: ”all-rank”, Increased rank = 1) 4.4 Compare three acquisition functions (Sampling set: ”all-rank”, Increased rank = 1, Initial sampling points: 200) 4.5 Compare two sampling sets ”all-rank” and ”only-sorted” (Increased rank = 1, Initial sampling points: 200) 4.6 Compare different initial sampling points of ”only-sorted” sampling set (Increased rank = 1) 4.7 Compare increased rank = and increased rank = (Sampling set: ”all-rank”, Initial sampling points: 200, Acquisition function: MPI) 4.8 Compare increased rank = and increased rank = (Sampling set: ”onlysorted”, Initial sampling points: 200, Acquisition function: MPI) 4.9 Evaluation minimum estimate traffic time through 24 intervals between Bayesian Optimization (BO) and Original method (Acquisition function: MPI, Search space: ”all-rank”, Initial sampling point = 200, Increase rank = 1) 4.10 Evaluation number of executing Test function between Bayesian Optimization (BO) and Original method through 24 intervals (Acquisition function: MPI, Search space: ”all-rank”, Initial sampling point = 200, Increase rank = 1) 4.11 Compare the difference percentage with the original method through 24 intervals of different sampling sets (Initial sampling points: 200, Acquisition function: MPI, Increase rank = 1) 53 3.1 3.2 3.3 4.1 4.2 39 55 57 58 59 60 60 61 62 63 64 Chapter Introduction 1.1 Overview Due to the growth of network infrastructure and digital content, Internet traffic has increased rapidly over the years At the same time, the number of people consuming web-based information and services is rising exponentially A study [1] has estimated that Internet traffic will increase 3x in the next years By 2021, more than 82% of the whole internet will be video traffic caused by Video-on-Demand (VoD) services As this enormous traffic creates multiple congested connections and degrades the network performance, VoD providers typically place their content on Content Delivery Networks (CDNs) Content Delivery Networks have emerged to overcome Internet congestion and overload by offering infrastructure and mechanisms to deliver content and services in a scalable manner CDN applications can be found in many industries, such as research institutions, media advertising, data centers, Internet Service Providers (ISPs), e-commerce, network carriers, and other carrier businesses Although CDNs could reduce video traffic, their servers are usually located in different network locations Even though several CDN providers place their cache servers on ISP networks, this method still does not reduce traffic considerably [2] The reason is that the servers are only in limited locations and CDN providers have no global knowledge of the underlying network Several ISPs are planning to build their own CDNs by placing cache servers on their networks, which are called Telco-CDNs [2], [3] They reduce the traffic on the peering links as well as internal communication links by confining the video requests to their networks However, because of the limited storage space and inefficient allocation of contents, the traffic reduction achieved is not sufficient Moreover, the objective of the Telco-CDN is to improve the efficiency of both the overlay and the underlying network infrastructure, which is different from conventional CDN To address this problem, recent studies aim to minimize traffic by increasing cache capacity They use a cooperative caching strategy, which adds multiple cache storage by sharing cache server contents In the study of Li et al [2], they proposed a content allocation algorithm based on Genetic Algorithm (GA) to always give a sub-optimal solution in the context of the network traffic Although such strategies could eliminate a large amount of traffic, the time it takes to measure the solution using a cluster is especially long Such a long calculation will lead to cache allocation inconsistencies, as access patterns vary by 20-60% per hour [4] To overcome the limitations of the GA algorithm, Nakajima et al [19], [24] proposed a light-weight color-based caching scheme using co-operative caching [2], [5] and hybrid caching [6] The scheme focuses on two important factors of traffic reduction: content distribution and the duplication of popular content by grouping caches and servers using a novel color tag scheme These color tags are efficiently distributed to the caches and servers through a lightweight color distribution scheme Even though the scheme reduces the computation time while keeping an approximate solution compared with the GA in terms of network traffic, the scheme’s computation time to find the color’s separator rank is still pretty long when the number of content categories increases The color distribution scheme is quietly equivalent to a bruteforce strategy Therefore, some calculations are not necessary In this research, we propose a solution based on Bayesian Optimization to solve the problem of lowering the calculation time of determining the color’s separator rank Then, we’ll evaluate the simulated and real datasets to determine the best-fitting parameter of the Bayesian Optimization approach as it applies to our situation based on comparison with the original color’s separator rank finding algorithm 1.2 Goal The main objective of this research is to improve the computation time of the color separator rank finding algorithm using Bayesian Optimization and evaluation it with the previous research and other solutions To achieve such a goal, we plan to carry out the following tasks: • Study CDNs and their related properties and problems • Study the light-weight color-based caching scheme using co-operative caching and hybrid caching • Study Bayesian Optimization • Apply the Bayesian Optimization solution to solve the computation time problem • Evaluating our solution with previous research and other algorithms 1.3 Scope Although the color-based cooperative caching scheme is a new scheme to solve the content distribution and delivery in the CDN system, it still has many shortcomings that need improvement In our thesis, we only concentrate on re-implementing the content’s color distribution algorithm that computes faster than the previous one Additionally, we not consider the new method will give an approximate solution compared to the old one In general, we will examine suggesting methods that are more time-optimal in terms of computing time while remaining within a certain range of acceptable errors in comparison to the original answer 1.4 Thesis Structure In the next section, we will study CDNs, color-based caching schemes, and the theoretical background required for this research In Chapter 3, we discussed the details of our proposed Bayesian Optimization approach In the next section, we evaluate our solution to find the best setting for the Bayesian Optimization method and compare it with the original solution In the final chapter, we conclude and schedule the plan for the next stage 4.1.2 Experiment Figure 4.2: Evaluation Bayesian Optimization (BO) method with original method (Sampling set: ”all-rank”, Increased rank = 1) 55 In the first experiment shown in Figure 4.2, we initial compare if the Bayesian Optimization method is effective against the original algorithm method Each data point is the estimated traffic time as the algorithm runs the Test function with a specific separator rank at that time The expected result is that the estimated traffic time is lower and the number of executing Test function as low as possible The initial results demonstrate that the Bayesian Optimization approach produces a better outcome for all three acquisition functions: Maximum Probability of Improvement (MPI), Expected Improvement (EI), and Lower Confidence Bound (LCB) compared to both the estimated traffic time and the number of executing Test function 56 4.1.3 Experiment The next experiment, we compare the change of initial sampling points to build the GP model for our Test function Figure 4.3: Evaluation different initial sampling points of Bayesian Optimization (Sampling set: ”all-rank”, Increased rank = 1) 57 Figure 4.3 shows that more initial sampling points tend to give better estimated traffic time in both three acquisition functions From 70 initial sampling points onwards, the estimated traffic time incline stayed the same With 200 initial sampling points, the estimated traffic time tends to go lower Therefore, with a certain number of initial sampling points, increasing them will tend to give a smaller estimated traffic time 4.1.4 Experiment For the convenience of future experiments, we will evaluate which acquisition functions are better among Maximum Probability of Improvement (MPI), Expected Improvement (EI), and Lower Confidence Bound (LCB) applied to our model To carry out this experiment, we ran the simulation ten times, with three acquisition functions executed each time with the same seed of random initial sample points and different from the other runs Based on the Experiment results, we used 200 initial sampling points with a sampling set ”all-rank” and increased rank = To depict the achieved result, which is the minimum estimated traffic time in ten simulations, we utilize a box plot diagram Figure 4.4: Compare three acquisition functions (Sampling set: ”all-rank”, Increased rank = 1, Initial sampling points: 200) The box plot diagram helps to represent important quantities of a series of values, such as minimum value, maximum value, quartile, and interquartile range To evaluate three acquisition functions, we first compared their medians (the red line), with the lower value being preferred According to the diagram, all three acquisition functions have the same medians, hence the difference between them is not significant Second, we examine the interquartile ranges (box lengths) that equal the difference between the 75th and 25th percentiles to compare dispersion, where the shorter length and lower box are better As the results are shown in the diagram, the MPI acquisition function seems to have the best interquartile range of the three Finally, we compare the overall spread as shown by the maximum value and the 58 minimum value All three acquisition functions have the same minimum value and MPI has the lowest maximum value, which means lower is better Therefore, MPI has the lowest overall spread To summarize, the MPI acquisition function tends to have a better estimated traffic time value in different initial sampling point sets As shown in the above figure and some extra simulations we have run, the LCB acquisition function shows some potential outliers (the lowest white point in the figure), but we choose MPI for overall cases 4.1.5 Experiment In the following experiments, we will start evaluating our greedy solutions by comparing two sampling sets: ”all-rank” and ”only-sorted.” To analyze this experiment, we will use the MPI acquisition function with 200 initial sampling points and increased rank = as a result of the previous two experiments Figure 4.5: Compare two sampling sets ”all-rank” and ”only-sorted” (Increased rank = 1, Initial sampling points: 200) The result shows that the sampling set ”only-sorted” gives a lower estimated traffic time than the ”all-rank” sampling set Furthermore, the ”only-sorted” sampling set gives a good result right from the start of the initial sampling points For more clarification, we will compare different initial sampling points of the ”only-sorted” sampling set in the figure below: 59 Figure 4.6: Compare different initial sampling points of ”only-sorted” sampling set (Increased rank = 1) The picture above shows that with only 10 initial sampling points onward, the result tends to be steady This can be used to trade-off between a better estimated traffic time or a better number of executing the Test function But overall, as a result of Experiment 2, the 200 initial sampling points still give a better estimated traffic time 4.1.6 Experiment To test the effectiveness of the ”increased rank” value, we will compare two sampling sets, ”all-rank” and ”only-sorted” with increased rank = and increased rank = Figure 4.7: Compare increased rank = and increased rank = (Sampling set: ”all-rank”, Initial sampling points: 200, Acquisition function: MPI) 60 Figure 4.8: Compare increased rank = and increased rank = (Sampling set: ”only-sorted”, Initial sampling points: 200, Acquisition function: MPI) The result from the above figure shows that the sampling set ”all-rank” gives a lower estimated traffic time with an increased rank = than with an increased rank = The sampling set ”only-sorted” likewise produces a better result with an increased rank = than with an increased rank = 1, but the difference is minor 4.1.7 Summary To summarize our findings from the simulated data, we compared and assessed the original method with the Bayesian Optimization approach using various sampling sets When compared to the original algorithm, the acquired results show that the Bayesian Optimization method produces better results in terms of the number of executing the Ttest function while still maintaining an acceptable or even better estimated traffic time For the Bayesian Optimization method, the MPI acquisition function is the best one among the three evaluated acquisition functions In addition, more initial sampling points in constructing the GP model seemed to yield a better result In the case of our evaluation, it was 200 initial sampling points Furthermore, our proposed greedy sampling set contains only separator ranks with popularity classes less than 10 with each other, referred to as ”only-sorted”, and we also elevate the increased rank to 4, both of which have a beneficial effect on the final outcome 4.2 Second Phase: Evaluation with Real Datasets In the second phase, we evaluate using records of VoD service from the real data and configuration mentioned in Section 2.4 We used all four sampling sets, A1, A4, O1, and O4 to evaluate if the experiments with the simulated data were correct But the main Bayesian Optimization method to compare with the original method is to use the sampling set ”all-rank” with 200 initial sampling points and increased rank = as shown in the figure below: 61 Figure 4.9: Evaluation minimum estimate traffic time through 24 intervals between Bayesian Optimization (BO) and Original method (Acquisition function: MPI, Search space: ”all-rank”, Initial sampling point = 200, Increase rank = 1) 62 Figure 4.10: Evaluation number of executing Test function between Bayesian Optimization (BO) and Original method through 24 intervals (Acquisition function: MPI, Search space: ”all-rank”, Initial sampling point = 200, Increase rank = 1) 63 The Figure 4.9 shows that with more content in each interval results in a bigger estimated traffic time and the Bayesian Optimization method gives a result that is not significantly inferior to the original algorithm, with the biggest loss of only 0.1% and sometimes even better results In the case of the number of executing the Test function, the Bayesian Optimization method is fixed at 230 because it depends on the initial sampling points (200) and the number of iterations (30) In the evaluation, the Bayesian Optimization method showed a better result of up to nearly 670% compared with the original method To show the difference between the original algorithm through 24 intervals and compare the sampling sets, we use a box plot chart as shown below: Figure 4.11: Compare the difference percentage with the original method through 24 intervals of different sampling sets (Initial sampling points: 200, Acquisition function: MPI, Increase rank = 1) The figure shows that the difference between the Bayesian Optimization method and the original method fluctuates in the range from −0.05% to +0.1% when ignoring the outliers In addition, the sampling set ”only-sorted” showed a better result in comparison to the ”all-rank” sampling set through the interquartile range The overall spread is lower than the ”all-rank” sampling set Besides, the increased rank = is not as good as expected, but still good in some cases through outliers 4.3 Conclusion In this evaluation, we have analyzed our proposed Bayesian Optimization method with the original method In the Bayesian Optimization approach, we evaluated different acquisition functions, sampling sets, and increased rank steps to find the best fit applied to our problem The obtained results based on the simulated datasets show that our proposed Bayesian Optimization method and greedy solution produce a better result in terms of reducing the number 64 of Test function executions while still observing an acceptable minimum estimated traffic time than the original method Furthermore, while testing on real datasets, the findings show an improvement of up to 670% in lowering the number of Test function executions while still keeping the final minimum estimated traffic time less than 0.1% compared to the original method 65 Chapter Summary The Bayesian Optimization method has been proven to be extremely effective for improving the computation time of the Color-Based Caching Scheme in Telco-CDNs 5.1 Achievements 5.1.1 Knowledge about CDN system and related problems This thesis relies largely on the degree of the Content Delivery Network system, which takes a significant amount of time and effort to research We investigated the Content Delivery Network system and related challenges, as well as how to combine data from several sources Moreover, we learned about the Color-Based Caching Scheme in Telco-CDNs and its computation time-consuming difficulty - the primary challenge that we need to solve 5.1.2 Using Bayesian Optimization to solve the optimization problems Our accomplishment in this research is the capacity to use Bayesian Optimization to optimize the color’s separator rank finding algorithm As a result, the Color-Based Caching Scheme’s computation time is considerably reduced Furthermore, due to noise from too many data points in the domain, the original Bayesian Optimization approach can perform inefficiently Therefore, we have proposed and evaluated solutions that lower the data domain while improving the searchability of the end result 5.1.3 Achievements in experiences, knowledge, and soft skills We have learned a lot in the process of working on this thesis for months We also learned how to learn, research, collaborate and solve problems What we have accomplished as a result of this thesis is quite promising in terms of what we can expect 5.2 Drawbacks Our results are merely for preliminary evaluation of a certain Color-Based Caching Scheme setting As a result, the parameters for Bayesian Optimization may be meaningful solely in that setting There are a few aspects that can affect the Color-Based Caching Scheme, but we haven’t been able to test them yet: 66 • CDN network topology: For different CDN network topologies, the content distribution is also different, thereby increasing the computation time The old BO settings may no longer be suitable • Number of contents: A larger or smaller number of contents can also change the separator rank for the same network configuration • Number of requests: The number of requests also has an effect on the final outcome because the likelihood of getting the contents and hit rate can change • Routing algorithm: There are routing algorithms: Color-based Routing of the Colorbased caching algorithm and the traditional Shortest Path Routing We have only evaluated using the Color-based Routing algorithm • Caching algorithm: There are caching algorithms: FIFO, LFU, LRU, and Hybrid Cache Combined with two routing algorithms, we have routing modes: full-color, tag-color, nocolor and no-cache Our evaluation only used the full-color mode with color-based routing and a hybrid cache algorithm • Real CDN data: We just evaluated a limited number of real datasets, so we can’t be sure if our strategy is working effectively on other datasets Just like we focused primarily on VoD content, the approach or setting for other content types may not be correct 5.3 Future Improvements The Bayesian Optimization approach produced promising results in our range based on preliminary evaluations We were unable to execute the evaluation more accurately due to data limits, time constraints, and hardware limitations Here’s what we can to improve and enhance this approach: • Continue to improve the drawbacks as mentioned in section 6.2 by evaluating different configurations, different topologies, and more real data sets • Deeper improvement of Bayesian Optimization by analyzing alternative prior models and tuning the model with different configurations to observe, so that the most efficient model can be selected Also, analyzing the acquisition functions in order to make suitable adjustments 67 References [1] V Cisco, “The zettabyte era: trends and analysis updated (07/06/2017),” 2017 [2] Z Li and G Simon, “In a telco-cdn, pushing content makes sense,” IEEE Transactions on Network and Service Management, vol 10, no 3, pp 300–311, 2013 [3] D De Vleeschauwer and D.C Robinson, “Optimum caching strategies for a Telco CDN,” Bell Labs Technical Journal, vol.16, no.2, pp.115–132, Sept 2011 [4] H Yu, D Zheng, B.Y Zhao, and W Zheng, “Understanding user behavior in large-scale video-on-demand systems,” Proc 1st ACM SIGOPS/EuroSys European Conference on Computer Systems 2006, pp.333–344, 2006 [5] N Choi, K Guan, D C Kilper, and G Atkinson, “In-network caching effect on optimal energy consumption in content-centric networking,” in 2012 IEEE international conference on communications (ICC) IEEE, 2012, pp 2889–2894 [6] Y Zhou, L Chen, C Yang, and D M Chiu, “Video popularity dynamics and its implication for replication,” IEEE transactions on multimedia, vol 17, no 8, pp 1273–1285, 2015 [7] Pathan, M., Broberg, J., Bubendorfer, K., Kim, K H., and Buyya, R An architecture for virtual organization (VO)-based effective peering of content delivery networks, UPGRADECN’07, In Proc of the 16th IEEE International Symposium on High Performance Distributed Computing [8] O Katz, R Perets, and G Matzliach, “Digging Deeper? An In-Depth Analysis of a Fast Flux Network,” Akamai White Paper, 2017 [9] Akamai and Juniper, “The Elastic CDN Solution”, Solution Brief, 3510532-001-EN, Dec 2014 [10] https://peering.google.com/#/infrastructure [11] J Leguay, G Paschos, E Quaglia, and B Smyth, “CryptoCache: Network Caching with Confidentiality,” in Proc of IEEE ICC, 2017 [12] A Berglund, “How Netflix Works With ISPs Around the Globe to Deliver a Great Viewing Experience,” Netflix blog, 2016 [13] Vakali, A and Pallis, G Content delivery networks: status and trends IEEE Internet Computing, 7(6), IEEE Computer Society, pp 68–74, 2003 [14] Douglis, F and Kaashoek, M F Scalable Internet services IEEE Internet Computing, 5(4),pp 36–37, 2001 [15] Pallis, G and Vakali, A Insight and perspectives for content delivery networks Communications of the ACM, 49(1), ACM Press, NY, USA, pp 101–106, 2006 [16] M Pathan and R Buyya, “A taxonomy of CDNs,” in Content Delivery Networks (Lecture Notes Electrical Engineering), Berlin, Germany: Springer, vol 2008, pp 33–37 68 [17] G Pallis, A Vakali, K Stamos, A Sidiropoulos, D Katsaros, and Y Manolopoulos, “A latency-based object placement approach in content distribution networks,” in Proc 3rd Latin Amer Web Congr (LA-WEB), Buenos Aires, Argentina, Nov 2005, p [18] W.-X Liu, S.-Z Yu, G Tan, and J Cai, ”Information-centric networking with built-in network coding to achieve multisource transmission at network-layer,” Comput Netw., vol 115, no 3, pp 110–128, 2017 [19] T Nakajima, M Yoshimi, C Wu, and T Yoshinaga, “A light-weight content distribution scheme for cooperative caching in Telco-CDNs,” Proc Fourth International Symposium on Computing and Networking (CANDAR ’16), pp.126–132, 2016 [20] H J Kushner, ”A new method of locating the maximum of an arbitrary multipeak curve in the presence of noise J Basic Engineering, 86:97-106, 1964.” [21] J Moˇckus, V Tiesis, and A Zilinskas Toward Global Optimization, volume 2, chapter The Application of Bayesian Methods for Seeking the Extremum, pages 117-128 Elsevier, 1978 [22] D Lizotte Practical Bayesian Optimization PhD thesis, University of Alberta, Edmonton, Alberta, Canada, 2008 [23] N Srinivas, A Krause, S M Kakade, and M Seeger Gaussian process optimization in the bandit setting: No regret and experimental design Proc International Conference on Machine Learning (ICML), 2010 [24] T Nakajima, M Yoshimi, C Wu, T Yoshinaga, ”Color-based cooperative cache and its routing scheme for Telco-CDNs” IEICE TRANSACTIONS on Information and Systems 100(12), 2847-2856 (2017) [25] E.W Dijkstra, “A note on two problems in connexion with graphs,” Numer Math., vol.1, no.1, pp.269–271, Dec 1959 [26] W Li, Y Li, W Wang, Y Xin, and Y Xu, “A collaborative caching scheme with network clustering and hash-routing in CCN,” 27th Annual IEEE International Symposium on Personal, Indoor and Mobile Radio Communications, pp.2547–2553, Sept 2016 [27] G Rossini and D Rossi, “Coupling caching and forwarding: Benefits, analysis, and implementation,” Proc 1st ACM Conference on Information-Centric Networking, pp.127–136, 2014 [28] Anh-Tu Ngoc Tran, Thanh-Dang Diep, Takuma Nakajimay, Masato Yoshimiz and Nam Thoai, ”A Scalable Color-Based Caching Scheme in Telco-CDNs”, 2019 15th International Conference on Network and Service Management (CNSM) [29] Nam Thoai, ”Emulation of the color-based caching scheme in Telco-CDNs with Mininet using real data” [30] Konstantinos Stamos, George Pallis, Athena Vakali, Dimitrios Katsaros, Antonis Sidiropoulos, and Yannis Manolopoulos 2010 CDNsim: A simulation tool for content distribution networks ACM Trans Model Comput Simul 20, 2, Article 10 (April 2010) [31] OMNeT++, https://omnetpp.org/, [Online; accessed April 2020] [32] Edmund K Burke, Graham Kendall, ”Search Methodologies: Introductory Tutorials in Optimization and Decision Support Techniques” Second Edition, 2014 [33] N Handigol et al., “Reproducible network experiments using container- based emulation,” in Proc of ACM CoNEXT’12 [34] Z Li and G Simon, “In a telco-cdn, pushing content makes sense,” Network and Service Management, IEEE Transactions on, vol 10, pp.300–311, 09 2013 69 ... pattern bias with a few seconds of the expected time 2.3.2 Color Tag Based Cooperative Caching, Color Tags Management Algorithm and Routing Algorithm Color Tag Based Cooperative Caching Nakajima and. .. Cooperative caching Researchers lead by Nakajima et al was proposed a color-based caching algorithm in the Telco-CDNs network, which is a merge of both cooperative caching and hybrid caching. .. Simulated Annealing and Original algorithms From evaluation graphs above we can make conclusions that: To have the similar result in the same condition, Simulated Annealing algorithm have a high

Tiêu đề	Studying And Developing A Cdn Caching Algorithm Using Machine Learning
Tác giả	Tran Trung Quan, Pham Trong Nhan
Người hướng dẫn	Assoc. Prof. Thoai Nam
Trường học	Ho Chi Minh City University of Technology
Chuyên ngành	Computer Science
Thể loại	Graduation Thesis
Năm xuất bản	2021
Thành phố	Ho Chi Minh City

Định dạng
Số trang	77
Dung lượng	1,97 MB