Faut management in inter cloud system

FAULT MANAGEMENT IN INTER-CLOUD SYSTEM In Partial Fulfillment of the Requirements of the Degree of MASTER OF INFORMATION TECHNOLOGY MANAGEMENT In Information Management By MR: LONG NGOC HOANG ID: MITM03006 International University - Vietnam National University HCMC May 2015 FAULT MANAGEMENT IN INTER-CLOUD SYSTEM In Partial Fulfillment of the Requirements of the Degree of MASTER OF INFORMATION TECHNOLOGY MANAGEMENT In Information Management By MR: LONG NGOC HOANG ID: MITM03006 International University - Vietnam National University HCMC May 2015 Under the guidance and approval of the committee, and approved by all its members, this thesis has been accepted in partial fulfillment of the requirements for the degree Approved By: Dr Sinh Van Nguyen Chairperson Dr Ha Manh Tran Thesis Supervisor Dr Sang Thi Thanh Nguyen Thesis Committee Dr Thai Duc Nguyen Thesis Committee Dr Phuong Luu Vo Thesis Committee Acknowledgements This thesis concludes my degree of the Master Information Technology Management, and is submitted to the School of Computer Science and Engineering at the International University, Vietnam National University - Ho Chi Minh City I would like to show my greatest gratitude to Dr Ha Manh Tran for his guidance and helpful advices His skilful and valuable comments and feedback help me get back on the thesis work whenever I lose my focus on the thesis objectives due to the nature of my business Last but not least, I would like to thank my family for unconditional support and encouragement, and thank my friends for valuable feedback Plagiarism Statements I would like to declare that, apart from the acknowledged references, this thesis either does not use language, ideas, or other original material from anyone; or has not been previously submitted to any other educational and research programs or institutions I fully understand that any writings in this thesis contradicted to the above statement will automatically lead to the rejection from the Master of Information Technology program at the International University - Vietnam National University Ho Chi Minh City Copyright Statement This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognize that its copyright rests with its author and that no quotation from the thesis and no information derived from it may be published without the author‘s prior consent c ○Long Ngoc Hoang - MITM03006 - 2015 Table of Contents Introduction 13 Literature Review 15 2.1 Fault 15 2.2 Event Correlation 20 2.2.1 Event Correlation Techniques 20 2.2.2 Existing Open Source Event Correlation Software 26 2.3 Related Cloud and Fault Management Software 27 2.4 Machine Learning 29 2.4.1 Feature Extraction from Logs 30 2.5 OpenStack 34 2.6 Hadoop 40 Proposal 48 Experiment 53 4.1 Use existing open source tools for monitoring and correlating logs 53 4.1.1 Setup OpenStack 53 4.1.2 Setup Ganglia 60 4.1.3 Setup Hadoop and OpenStack on Windows Azure 68 4.1.4 Open Stack Log Collection and Processing 71 4.1.5 Open Stack Database Tables 74 Conclusion and Future Work 76 A Setup and Configuration 78 A.1 Setup OpenStack with Fuel 78 A.2 Setup and Configure Ganglia 81 A.3 Logstash Configuration 82 List of Figures 2-1 A taxonomy of faults [1] 16 2-2 A taxonomy for online failure prediction approaches [2] 18 2-3 Fault management on inter-cloud enviroment 28 2-4 OpenStack conceptual architecture [3] 36 2-5 OpenStack logical architecture [3] 37 2-6 Devstack’s localrc for controller node (192.168.1.5) 38 2-7 Devstack’s localrc for compute node (192.168.1.6) 38 2-8 nova-manage service list 38 2-9 Launch an instance from Horizon dashboard 39 2-10 MapReduce workflow 41 2-11 Hadoop ecosystem 42 2-12 HCatalog - table list 43 2-13 HCatalog - batting_data table 43 2-14 HCatalog - master_data table 44 2-15 Hive query 44 2-16 Hive query result 45 2-17 Hive query log 45 2-18 Pig query 46 2-19 Pig query result 46 2-20 Pig query log 47 3-1 Fault analyzer in the fault resolution system 49 3-2 Log management model 50 3-3 OpenStack Log Analysis Block Diagram [4] 51 3-4 Monitoring and Alerting for OpenStack [5] 52 4-1 Critical issue from cinder-scheduler service 54 4-2 Error from nova-compute service 54 4-3 OpenStack Nova log files on controller node 54 4-4 Error from savanna-api log 55 4-5 Node Overview 61 4-6 Summary Node Metric Last Hour 62 4-7 CPU Metrics 63 4-8 Disk Metrics 64 4-9 Load Metrics 65 4-10 Memory Metrics 66 4-11 Network and Process Metrics 67 4-12 Ganglia metrics on Graphite 68 4-13 Windows Azure Virtual Network for Hadoop and OpenStack clusters 4-14 Hadoop cluster on Windows Azure 69 70 4-15 OpenStack Juno on Windows Azure 70 4-16 Logstash Historam 72 4-17 Open Stack Log Type Summary 72 4-18 Query and filter Open Stack Logs 73 4-19 Open Stack Nova log 73 4-20 Error status of an instance on OpenStack Dashboard 74 4-21 Information from nova.instance_faults and nova.instances tables 75 4-22 Exception details from nova.instance_faults table 75 5-1 Log Analysis Workflow 77 A-1 Fuel Server 79 A-2 Fuel UI 79 A-3 Successfully Havana Deployment on Fuel 80 A-4 Open Stack Havana Services 81 List of Tables 2.1 Advantages and drawbacks of the presented event correlation approaches 25 2.2 OpenStack services 35 2.3 OpenStack Log Location 39 4.1 OpenStack Cinder Log Files 56 4.2 OpenStack Nova Log Files 57 4.3 OpenStack Horizon Log Files 58 4.4 OpenStack Keystone Log Files 58 4.5 OpenStack Glance Log Files 58 4.6 OpenStack Ceilometer Log Files 59 4.7 OpenStack Heat Log Files 60 4.8 OpenStack Savanna Log Files 60 10 Appendix A Setup and Configuration A.1 Setup OpenStack with Fuel A minimal non-HA with Cinder installation (1 controller + compute + cinder) can be achieved by using Mirantis Fuel [97] In order to successfully run Mirantis OpenStack under VirtualBox, we need to: - download the official release (.iso) and place it under ’iso’ directory - update the "config.sh" file to change settings (number of OpenStack nodes, CPU, RAM, HDD) Then run "./launch.sh" to pick up the iso, and spin up master node and slave nodes Once the Fuel server A-1 is up and running, we can access to the Fuel UI A-2 and deploy OpenStack enviroment 78 Figure A-1: Fuel Server Figure A-2: Fuel UI As the Fuel deployment run successfully, we can access the Horizon dashboard and verify the status of Open Stack services as in Figure A-3, A-4 79 Figure A-3: Successfully Havana Deployment on Fuel 80 Figure A-4: Open Stack Havana Services A.2 Setup and Configure Ganglia Install depenencies and Round Robin Database (RRD) $ sudo apt-get install libcairo2-dev libpango1.0-dev libapr1-dev $ sudo apt-get install rrdtool librrd-dev Place where RRDTool graphs will be stored and make sure that RRDTool can write here $ mkdir -p /var/lib/ganglia/rrds $ chown nobody /var/lib/ganglia/rrds Configure and install the Ganglia 3.6.0 from source $ tar xzvf ganglia-3.6.0.tar.gz $ cd ganglia-3.6.0 81 $ /configure -with-gmetad -prefix=/opt/ganglia $ make & make install $ gmond -t | tee /opt/ganglia/etc/gmond.conf $ cp gmetad/gmetad.conf /opt/ganglia/etc/gmetad.conf Configure and install Ganglia Web 3.5.12 from source $ sudo apt-get install php5 php5-common $ tar xzvf ganglia-web-3.5.12.tar.gz $ cd ganglia-web-3.5.12 $ (Edit the Makefile and update GDESTDIR to /var/www/ganglia-web) $ make install Now we can start the gmond and gmetad services and observe Ganglia metrics $ sudo /opt/ganglia/sbin/gmond $ sudo /opt/ganglia/sbin/gmetad A.3 Logstash Configuration The below snippet shows a sample Logstash configuration file to process OpenStack Nova service The configuration details can be found in the thesis CD-ROM input { file { t y p e => " nova " s t a r t _ p o s i t i o n => " b e g i n n i n g " p a t h => [ " / v a r / l o g / nova / nova−c o n s o l e a u t h l o g " , " / v a r / l o g / nova / nova−a p i l o g " , " / v a r / l o g / nova / nova−c e r t l o g " , " / v a r / l o g / nova / nova−c o n d u c t o r l o g " , 82 " / v a r / l o g / nova / nova−manage l o g " , " / v a r / l o g / nova / nova l o g " , " / v a r / l o g / nova / nova−s c h e d u l e r l o g " , " / v a r / l o g / nova / nova−o b j e c t s t o r e l o g " ] } filter { grok { p a t t e r n s _ d i r => " / p a t t e r n s / " t y p e => " nova " p a t t e r n => "%{TIMESTAMP_ISO8601 : t i m e s t a m p }%{ ˓→ AUDITLOGLEVEL : l e v e l } %{PROG : p r o g r a m }%{ ˓→ GREEDYDATA: m e s s a g e } " } multiline { t y p e => " nova " p a t t e r n => " ^ ( ( [ − ] + − ( ? : ? [ − ] | [ − ] ) ˓→ − ( ? : [ ] | [ − ] ? [ − ] | ? [ − ] ) ) ˓→ | ( ( ? : ? [ − ] | [ − ] ) ˓→ / ( ? : [ ] | [ − ] ? [ − ] | ? [ − ] ) ) ) * $ " n e g a t e => t r u e what => " p r e v i o u s " } } output { elasticsearch { # S e t t i n g ’ embedded ’ w i l l r u n a # real elasticsearch server inside logstash 83 embedded => t r u e } } 84 Bibliography [1] A Avizienis, J.-C Laprie, B Randell, and C Landwehr, “Basic concepts and taxonomy of dependable and secure computing,” IEEE Trans Dependable Secur Comput., vol 1, pp 11–33, Jan 2004 [2] F Salfner, M Lenk, and M Malek, “A survey of online failure prediction methods,” ACM Comput Surv., vol 42, pp 10:1–10:42, Mar 2010 [3] “OpenStack Cloud Administrator Guide.” http://docs.openstack.org/ admin-guide-cloud/content/ Last access in November 2014 [4] “Hadoop for OpenStack Log Analysis.” http://www.slideshare.net/ openstack/pittaro-open-stackloganalysis20130416-19109557/ Last access in November 2014 [5] “Monitoring and Alerting for OpenStack.” http://www.subbu.org/blog/ 2013/10/monitoring-and-alerting-for-openstack Last access in November 2014 [6] “Amazon Elastic Compute Cloud (Amazon EC2).” http://aws.amazon.com/ ec2/ Last access in November 2014 [7] “Google App Engine.” https://cloud.google.com/products/ app-engine/ Last access in November 2014 [8] “Windows Azure.” http://www.windowsazure.com/ November 2014 Last access in [9] “OpenStack Cloud Software.” http://www.openstack.org/, 2010 Last access in November 2014 [10] “Eucalyptus.” http://www.eucalyptus.com/ 2014 Last access in November [11] “Nimbus Project.” http://www.nimbusproject.org/ November 2014 Last access in [12] “OpenNebula.” http://opennebula.org/ Last access in November 2014 [13] “Apache Hadoop.” http://hadoop.apache.org/ Last access in November 2014 85 [14] M Armbrust, A Fox, R Griffith, A D Joseph, R Katz, A Konwinski, G Lee, D Patterson, A Rabkin, I Stoica, and M Zaharia, “A view of cloud computing,” ACM Communications, vol 53, pp 50–58, Apr 2010 [15] M Armbrust, A Fox, R Griffith, A D Joseph, R H Katz, A Konwinski, G Lee, D A Patterson, A Rabkin, I Stoica, and M Zaharia, “Above the clouds: A berkeley view of cloud computing,” Tech Rep UCB/EECS-2009-28, EECS Department, University of California, Berkeley, Feb 2009 [16] R Jhawar, V Piuri, and M Santambrogio, “Fault tolerance management in cloud computing: A system-level perspective,” Systems Journal, vol 7, no 2, 2012 [17] R Dudko, A Sharma, and J Tedesco, “Effective failure prediction in hadoop clusters,” tech rep., University of Illinois, 2012 [18] A S Thanamani, “A survey on failure prediction methods,” International Journal of Engineering Science and Technology (IJEST), vol 3, no 2, 2011 [19] N Kuromatsu, M Okita, and K Hagihara, “Evolving fault-tolerance in hadoop with robust auto-recovering jobtracker,” Bulletin of Networking, Computing, Systems, and Software, vol 2, no 1, 2013 [20] E Garduno, S P Kavulya, J Tan, R Gandhi, and P Narasimhan, “Theia: visual signatures for problem diagnosis in large hadoop clusters,” in Proc 26th International Conference on Large Installation System Administration: Strategies, Tools, and Techniques (LISA’12), (Berkeley, CA, USA), pp 33–42, USENIX Association, 2012 [21] J Tan, S Kavulya, R Gandhi, and P Narasimhan, “Visual, log-based causal tracing for performance debugging of mapreduce systems,” in Proc 2010 IEEE 30th International Conference on Distributed Computing Systems (ICDCS’10), (Washington, DC, USA), pp 795–806, IEEE Computer Society, 2010 [22] X Ju, L Soares, K G Shin, and K D Ryu, “Towards a fault-resilient cloud management stack,” in 5th USENIX Workshop on Hot Topics in Cloud Computing, (San Jose, CA, USA), USENIX Association, 2013 [23] D Kondo, B Javadi, A Iosup, and D Epema, “The failure trace archive: Enabling comparative analysis of failures in diverse distributed systems,” in Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, CCGRID ’10, (Washington, DC, USA), pp 398–407, IEEE Computer Society, 2010 [24] B Javadi, D Kondo, A Iosup, and D Epema, “The failure trace archive: Enabling the comparison of failure measurements and models of distributed systems,” Journal of Parallel and Distributed Computing, vol 73, pp 1208–1223, Aug 2013 86 [25] N Yigitbasi, M Gallet, D Kondo, A Iosup, and D Epema, “Analysis and modeling of time-correlated failures in large-scale distributed systems,” in Grid Computing (GRID), 2010 11th IEEE/ACM International Conference on, pp 65–72, 2010 [26] M Gallet, N Yigitbasi, B Javadi, D Kondo, A Iosup, and D Epema, “A model for space-correlated failures in large-scale distributed systems,” in Proceedings of the 16th International Euro-Par Conference on Parallel Processing: Part I, EuroPar’10, (Berlin, Heidelberg), pp 88–100, Springer-Verlag, 2010 [27] T N Minh and G Pierre, “Failure analysis and modeling in large multi-site infrastructures,” in Distributed Applications and Interoperable Systems, vol 7891 of Lecture Notes in Computer Science, pp 127–140, Springer Berlin Heidelberg, 2013 [28] S Kavulya, J Tan, R Gandhi, and P Narasimhan, “An analysis of traces from a production mapreduce cluster,” in Proceedings of the 2010 10th IEEE/ACM International Conference on Cluster, Cloud and Grid Computing, CCGRID ’10, (Washington, DC, USA), pp 94–103, IEEE Computer Society, 2010 [29] G Wu, H Zhang, M Qiu, Z Ming, J Li, and X Qin, “A decentralized approach for mining event correlations in distributed system monitoring,” J Parallel Distrib Comput., vol 73, pp 330–340, Mar 2013 [30] T Benson, S Sahu, A Akella, and A Shaikh, “A first look at problems in the cloud,” in Proceedings of the 2Nd USENIX Conference on Hot Topics in Cloud Computing, HotCloud’10, (Berkeley, CA, USA), pp 15–15, USENIX Association, 2010 [31] G Lee, J Lin, C Liu, A Lorek, and D Ryaboy, “The unified logging infrastructure for data analytics at twitter,” Proc VLDB Endow., vol 5, pp 1771–1780, Aug 2012 [32] “Event Correlation Engine.” ftp://ftp.tik.ee.ethz.ch/pub/ students/2009-FS/MA-2009-01.pdf Last access in November 2014 [33] A Bouloutas, G Hart, and M Schwartz, “Simple finite-state fault detectors for communication networks,” Communications, IEEE Transactions on, vol 40, pp 477– 479, Mar 1992 [34] R Cronk, P Callahan, and L Bernstein, “Rule-based expert systems for network management and operations: an introduction,” Network, IEEE, vol 2, pp 7–21, Sept 1988 [35] R Davis, H Shrobe, W Hamscher, K Wieckert, M Shirley, and S Polit, “Computation and intelligence,” in Computation and Intelligence (G F Luger, ed.), ch Diagnosis Based on Description of Structure and Function, pp 623–634, Menlo Park, CA, USA: American Association for Artificial Intelligence, 1995 [36] S Yemini, S Kliger, E Mozes, Y Yemini, and D Ohsie, “High speed and robust event correlation,” Communications Magazine, IEEE, vol 34, pp 82–90, May 1996 87 [37] D M Meira, “A model for alarm correlation in telecommunications networks,” 1997 [38] A Bouloutas, S Calo, and A Finkel, “Alarm correlation and fault identification in communication networks,” Communications, IEEE Transactions on, vol 42, pp 523–533, Feb 1994 [39] B Gruschke, “Integrated event management: Event correlation using dependency graphs,” 1998 [40] I Ben-gal, “Bayesian networks.” http://www.eng.tau.ac.il/~bengal/ BN.pdf, 2007 [41] “Swatch.” http://sourceforge.net/projects/swatch/ Last access in November 2014 [42] S E Hansen and E T Atkins, “Automated system monitoring and notification with swatch,” in Proceedings of the 7th USENIX Conference on System Administration, LISA ’93, (Berkeley, CA, USA), pp 145–152, USENIX Association, 1993 [43] “LogSurfer.” http://www.crypt.gen.nz/logsurfer/ November 2014 Last access in [44] “SEC.” http://simple-evcorr.sourceforge.net/ November 2014 Last access in [45] “OSSEC.” http://www.ossec.net/ Last access in November 2014 [46] “Ganglia.” http://ganglia.sourceforge.net/ Last access in November 2014 [47] M L Massie, B N Chun, and D E Culler, “The ganglia distributed monitoring system: Design, implementation and experience,” Parallel Computing, vol 30, p 2004, 2003 [48] “Nagios.” http://www.nagios.org/ Last access in November 2014 [49] “collectd.” http://collectd.org/ Last access in November 2014 [50] “Riemann.” http://riemann.io/ Last access in November 2014 [51] “Splunk.” http://www.splunk.com Last access in November 2014 [52] “Splunkstorm.” http://www.splunkstorm.com/ Last access in November 2014 [53] “Apache Whirr.” http://whirr.apache.org/ Last access in November 2014 [54] “CDH.” http://www.cloudera.com/content/cloudera/en/ products-and-services/cdh.html/ Last access in November 2014 88 [55] “HDInsight.” http://www.windowsazure.com/en-us/ documentation/services/hdinsight/ Last access in November 2014 [56] “Amazon Elastic MapReduce (Amazon EMR).” http://aws.amazon.com/ elasticmapreduce/ Last access in November 2014 [57] “Savanna.” https://wiki.openstack.org/wiki/Savanna/ Last access in November 2014 [58] “Mirantis.” http://www.mirantis.com/ Last access in November 2014 [59] W B Cavnar and J M Trenkle, “N-gram-based text categorization,” in In Proceedings of SDAIR-94, 3rd Annual Symposium on Document Analysis and Information Retrieval, pp 161–175, 1994 [60] T Sipola, A Juvonen, and J Lehtonen, “Anomaly detection from network logs using diffusion maps,” Engineering Applications of Neural Networks, 2011 [61] K Sparck Jones, “Document retrieval systems,” ch A Statistical Interpretation of Term Specificity and Its Application in Retrieval, pp 132–142, London, UK, UK: Taylor Graham Publishing, 1988 [62] W Xu, L Huang, A Fox, D Patterson, and M I Jordan, “Detecting large-scale system problems by mining console logs,” in Proceedings of the ACM SIGOPS 22Nd Symposium on Operating Systems Principles, SOSP ’09, (New York, NY, USA), pp 117–132, ACM, 2009 [63] M Aharon, G Barash, I Cohen, and E Mordechai, “One graph is worth a thousand logs: Uncovering hidden structures in massive system event logs,” in Machine Learning and Knowledge Discovery in Databases (W Buntine, M Grobelnik, ˘ and J Shawe-Taylor, eds.), vol 5781 of Lecture Notes in Computer D MladeniÄG, Science, pp 227–243, Springer Berlin Heidelberg, 2009 [64] J Han, M Kamber, and J Pei, Data Mining: Concepts and Techniques San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 3rd ed., 2011 [65] C Yuan, N Lao, J.-R Wen, J Li, Z Zhang, Y.-M Wang, and W.-Y Ma, “Automated known problem diagnosis with event traces,” SIGOPS Oper Syst Rev., vol 40, pp 375–388, Apr 2006 [66] F V Jensen, Introduction to Bayesian Networks Secaucus, NJ, USA: SpringerVerlag New York, Inc., 1st ed., 1996 [67] D Barbara, N Wu, and S Jajodia, “Detecting Novel Network Intrusions using Bayes estimators,” in SIAM International Conference on Data Mining, 2001 [68] A P Engelbrecht, Computational Intelligence: An Introduction Wiley Publishing, 2nd ed., 2007 89 [69] V Chandola, A Banerjee, and V Kumar, “Anomaly detection: A survey,” ACM Comput Surv., vol 41, pp 15:1–15:58, July 2009 [70] S Lloyd, “Least squares quantization in pcm,” IEEE Transactions on Information Theory, vol 28, pp 129–137, Sept 2006 [71] P Brucker, “On the complexity of clustering problems,” in Optimization and Operations Research (R Henn, B Korte, and W Oettli, eds.), vol 157 of Lecture Notes in Economics and Mathematical Systems, pp 45–54, Springer Berlin Heidelberg, 1978 [72] Y Guan, A Ghorbani, and N Belacel, “Y-means: a clustering method for intrusion detection,” in Electrical and Computer Engineering, 2003 IEEE CCECE 2003 Canadian Conference on, vol 2, pp 1083–1086 vol.2, May 2003 [73] T Kohonen, “Self-organized formation of topologically correct feature maps,” Biological Cybernetics, vol 43, no 1, pp 59–69, 1982 [74] V Emamian, M Kaveh, and A Tewfik, “Robust clustering of acoustic emission signals using the kohonen network,” in Acoustics, Speech, and Signal Processing, 2000 ICASSP ’00 Proceedings 2000 IEEE International Conference on, vol 6, pp 3891–3894 vol.6, 2000 [75] K Labib and R Vemuri, “NSOM: A real-time network-based intrusion detection system using self-organizing maps,” Networks and Security, 2002 [76] M Ramadas, S Ostermann, and B Tjaden, “Detecting anomalous network traffic with self-organizing maps,” in Recent Advances in Intrusion Detection (G Vigna, C Kruegel, and E Jonsson, eds.), vol 2820 of Lecture Notes in Computer Science, pp 36–54, Springer Berlin Heidelberg, 2003 [77] S Nousiainen, J Kilpi, P Silvonen, and M Hiirsalmi, “Anomaly detection from server log data,” 2009 [78] P Kumpulainen and K Hätönen, “Local anomaly detection for mobile network monitoring,” Inf Sci., vol 178, pp 3840–3859, Oct 2008 [79] J Zheng, M Hu, B Fang, and H Zhang, “Anomaly detection using fast sofm,” in Grid and Cooperative Computing - GCC 2004 Workshops (H Jin, Y Pan, N Xiao, and J Sun, eds.), vol 3252 of Lecture Notes in Computer Science, pp 530–537, Springer Berlin Heidelberg, 2004 [80] M Ester, H P Kriegel, J Sander, and X Xu, “A Density-Based algorithm for discovering clusters in large spatial databases with noise,” in Second International Conference on Knowledge Discovery and Data Mining (E Simoudis, J Han, and U Fayyad, eds.), (Portland, Oregon), pp 226–231, AAAI Press, 1996 90 [81] A Ram, S Jalal, A S Jalal, and M Kumar, “A density based algorithm for discovering density varied clusters in large spatial databases,” International Journal of Computer Applications, vol 3, pp 1–4, June 2010 Published By Foundation of Computer Science [82] A Lakhina, M Crovella, and C Diot, “Diagnosing network-wide traffic anomalies,” in Proceedings of the 2004 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, SIGCOMM ’04, (New York, NY, USA), pp 219–230, ACM, 2004 [83] “Comparing Open Source Private Cloud Platforms.” http://www.oscon.com/ oscon2012/public/schedule/detail/24376/ Last access in November 2014 [84] “OpenStack Documentation.” http://docs.openstack.org/ Last access in December 2014 [85] “Devstack.” http://devstack.org/ Last access in December 2014 [86] J Dean and S Ghemawat, “Mapreduce: Simplified data processing on large clusters,” Commun ACM, vol 51, pp 107–113, Jan 2008 [87] S Ghemawat, H Gobioff, and S.-T Leung, “The google file system,” SIGOPS Oper Syst Rev., vol 37, pp 29–43, Oct 2003 [88] “Hive.” http://hive.apache.org/ Last access in December 2014 [89] “Pig.” http://pig.apache.org/ Last access in December 2014 [90] “Hcatalog.” https://hive.apache.org/hcatalog/ Last access in December 2014 [91] “Hortonworks Sandbox 2.0.” http://hortonworks.com/products/ hortonworks-sandbox/ Last access in December 2014 [92] “Sean Lahman’s extensive historical baseball database.” http://seanlahman com/files/database/lahman591-csv.zip Last access in December 2014 [93] “Chukwa.” http://chukwa.apache.org/ Last access in December 2014 [94] “Flume.” http://flume.apache.org/ Last access in December 2014 [95] “Mahout.” http://mahout.apache.org/ Last access in December 2014 [96] H M Tran, S Ha, L N Hoang, and A V T Tran, “Fault resolution system for intercloud environment,” Vietnam Academy of Science and Technology, vol 51, no 4B, 2013 91 [97] “Mirantis OpenStack.” http://software.mirantis.com/ key-related-openstack-projects/project-fuel/ Last access in November 2014 [98] “Graphite.” http://graphite.wikidot.com/ Last access in November 2014 [99] O Klose, “Hadoop on linux on azure.” http://blogs technet.com/b/oliviaklose/archive/2014/06/17/ hadoop-on-linux-on-azure-1.aspx, 2014 [100] “logstash.” http://logstash.net/ Last access in November 2014 [101] “grok filter.” http://logstash.net/docs/1.4.0/filters/grok Last access in November 2014 [102] “elasticsearch.” http://www.elasticsearch.org/ Last access in November 2014 [103] “elasticsearch output.” http://logstash.net/docs/1.4.0/outputs/ elasticsearch Last access in November 2014 [104] “Kibana.” http://www.elasticsearch.org/overview/kibana/ Last access in November 2014 92 [...]... these systems As a result, intercloud environment fostering the centralization of various services need a large number of system administrators and supporting systems to manage faults occurring in the intercloud systems and services It is necessary to develop a supporting system that can managing and analysing faults In this thesis, we propose an approach for monitoring and analysing faults on the intercloud...Abstract Nowadays, managing applications on intercloud environment especially monitoring faults becomes challenging due to the increasing of complexity and diversity of these systems The intercloud environment fostering the centralization of various services need a large number of system administrators and supporting systems to manage faults occurring in the intercloud systems and services It is... to develop a supporting system that can managing and analysing faults This master thesis deals with the topic of fault management on intercloud systems This thesis research investigates multiple studies of fault, techniques, and related fault management software We setup intercloud environment and propose various approaches for monitoring and analysing fault on intercloud system In particular, we... domains related to large data processing, such as indexing a large number of web pages, doing financial risk analysis and studying customer behavior From the varieties of cloud and big data providers, consumers may have a lot of workloads running across their intercloud environment Managing applications on intercloud environment especially monitoring faults becomes challenging due to the increasing... their ecosystems to understand the complexity of intercloud environment We deploy and integrate several open source tools for monitoring and analysing faults in the intercloud environment Keywords: Fault Management, Inter- Cloud, Cloud Computing, OpenStack, Hadoop, Event Correlation 11 This page is intentionally left blank 12 Chapter 1 Introduction Communication networks and distributed systems today... intercloud environment The approach recruits open source technologies to facilitate monitoring and correlating services logs among cloud systems The contribution is thus twofold: 1 Studying faults and existing techniques and tools of fault management on cloud systems We also study OpenStack, Hadoop components and their ecosystems to understand the complexity of intercloud environment 2 Deploying and integrating... to adapt the increasing demand of users Managing services operating on these systems is even more challenging Cloud computing has recently emerged as a new paradigm of provisioning infrastructure, platform, and software as services over the Internet This paradigm combines distributed computing resources and virtualization technologies that outsource not only platform and software but also infrastructure... Deploying and integrating several open source tools for monitoring and analysing faults In particular, we collect and process services logs on intercloud environment including OpenStack and Hadoop components The rest of the thesis is structured as follows: the next chapter presents the literature review of faults, survey of tools and techniques of faults management on single cloud, intercloud environment... monitoring and analysing faults on the intercloud environment with the system architecture and component communication The chapter 4 provides experiments for monitoring and analysing faults on the intercloud systems The chapter 5 concludes this thesis with the short discussion of the ongoing work Last but not least, the Appendix A provides the details of setup and configuration that have been used in. .. Stream Processing) and CEP (Complex Event Processing) applications in Java (additionally, NEsper, written in C#, can be used with NET) Although Esper is not primarily targeted at network event correlation, it is a CEP and ESP toolkit certainly worth mentioning 2.3 Related Cloud and Fault Management Software Figure 2-3 represents where the cloud and fault management software locate in the intercloud environment ...FAULT MANAGEMENT IN INTER- CLOUD SYSTEM In Partial Fulfillment of the Requirements of the Degree of MASTER OF INFORMATION TECHNOLOGY MANAGEMENT In Information Management By MR: LONG... Nowadays, managing applications on intercloud environment especially monitoring faults becomes challenging due to the increasing of complexity and diversity of these systems The intercloud environment... of workloads running across their intercloud environment Managing applications on intercloud environment especially monitoring faults becomes challenging due to the increasing of complexity

Định dạng
Số trang	92
Dung lượng	3,91 MB