1. Trang chủ
  2. » Giáo Dục - Đào Tạo

Optimizing knowledge reuse within firms frameworks, strategies and emerging tools

162 243 0

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Định dạng
Số trang 162
Dung lượng 1,8 MB

Nội dung

OPTIMIZATION TECHNIQUES FOR COMPLEX MULTI-QUERY APPLICATIONS Wang Guoping NATIONAL UNIVERSITY OF SINGAPORE 2014 NATIONAL U NIVERSITY OF S INGAPORE D OCTORAL T HESIS OPTIMIZATION TECHNIQUES FOR COMPLEX MULTI-QUERY APPLICATIONS Supervisor: Author: Prof. Chan Chee Yong Wang Guoping A thesis submitted for the degree of Doctor of Philosophy in the Department of Computer Science School of Computing 2014 DECLARATION I hereby declare that this thesis is my original work and it has been written by me in its entirety. I have duly acknowledged all the sources of information which have been used in the thesis. This thesis has also not been submitted for any degree in any university previously. Wang Guoping January, 2014 i ACKNOWLEDGEMENT I would like to express the deepest appreciation to my supervisor, Prof. Chan Chee Yong. Without his guidance and persist help, my thesis would not have been finished. During the last few years, he has spent countless time to patiently guide me to build interesting ideas, strengthen the algorithms and improve the writings. As a supervisor, he shows his wisdom, insights, wide knowledge and conscientious attitude. All of these set me a good example to be a good researcher. In addition to my research, He also helps me a lot on my personal life. After my scholarship terminated, He hired me as a research assistant and gave me the GSR support under his research grant so that I can concentrate on my research without worrying about the financial problems. During my job hunting, he gave me many valuable suggestions and comments. I am really grateful to have him as my supervisor in my Ph.D. life. I would like to thank my thesis committee, Prof. Tan Kian Lee and Prof. Stephane Bressan for their valuable comments on my thesis as well as recommendation letters for my research assistant position as well as job hunting. I would like to thank all my friends in the database group who have made my Ph.D. life more colorful. They are Bao Zhifeng, Li Lu, Li Hao, Zeng Zhong, Kang Wei, Zhou Jingbo, Tang Ruiming, Song Yi, Zeng Yong, Xiao Qian and many others. Special thanks to the church events organized by Prof. Tan Kian Lee and Dr. Wang Zhengkui every year which bring us together as a family. Finally, I would like to thank my parents for their silent support and trust for every decision I made during my Ph.D. life. ii CONTENTS Declaration i Acknowledgement ii Summary vii Introduction 1.1 Multiple Query Optimization . . . . . . . . . . . . . . . . . . . . . . . . 1.2 Research Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 Efficient Processing of Enumerative Set-based Queries . . . . . . 1.2.2 Multi-Query Optimization in MapReduce Framework . . . . . . 1.2.3 Optimal Join Enumeration in MapReduce Framework . . . . . . 1.3 Thesis Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1.4 Thesis Organization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii CONTENTS Related Work 10 2.1 Preliminaries on MapReduce . . . . . . . . . . . . . . . . . . . . . . . . 10 2.2 Efficient Processing of Enumerative Set-based Queries . . . . . . . . . . 12 2.3 Multi-Query Optimization in MapReduce Framework . . . . . . . . . . . 13 2.4 Optimal Join Enumeration in MapReduce Framework . . . . . . . . . . . 15 Efficient Processing of Enumerative Set-based Queries 18 3.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 3.2 Set-based Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 3.3 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.4 Baseline Solution using SQL . . . . . . . . . . . . . . . . . . . . . . . . 22 3.4.1 Baseline Solution . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3.4.2 Detail Illustration of Baseline Solution . . . . . . . . . . . . . . 24 3.5 Basic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.6 Handling Large Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3.6.1 Phase 1: Partitioning Phase . . . . . . . . . . . . . . . . . . . . . 33 3.6.2 Phase 2: Enumeration Phase . . . . . . . . . . . . . . . . . . . . 34 3.6.3 Progressive Approaches . . . . . . . . . . . . . . . . . . . . . . 38 Extensions and Optimizations . . . . . . . . . . . . . . . . . . . . . . . 39 3.7.1 Evaluation of SQs . . . . . . . . . . . . . . . . . . . . . . . . . 40 3.7.2 Optimizations of SQ Evaluation . . . . . . . . . . . . . . . . . . 41 3.7 iv CONTENTS 3.8 3.9 Performance Study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42 3.8.1 Results for BSQs on Synthetic Datasets . . . . . . . . . . . . . . 45 3.8.2 Results for BSQs on Real Dataset . . . . . . . . . . . . . . . . . 49 3.8.3 Results for SQs on Synthetic Datasets . . . . . . . . . . . . . . . 51 3.8.4 Results for SQs on Real Dataset . . . . . . . . . . . . . . . . . . 52 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Multi-Query Optimization in MapReduce Framework 54 4.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 4.2 Assumptions & Notations . . . . . . . . . . . . . . . . . . . . . . . . . . 55 4.3 Multi-job Optimization Techniques . . . . . . . . . . . . . . . . . . . . . 57 4.3.1 Grouping Technique . . . . . . . . . . . . . . . . . . . . . . . . 57 4.3.2 Generalized Grouping Technique . . . . . . . . . . . . . . . . . 59 4.3.3 Materialization Techniques . . . . . . . . . . . . . . . . . . . . . 64 4.3.4 Discussions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 Cost Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 4.4.1 A Cost Model for MapReduce . . . . . . . . . . . . . . . . . . . 69 4.4.2 Costs for the Proposed Techniques . . . . . . . . . . . . . . . . . 70 Optimization Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 71 4.5.1 Map Output Key Ordering Algorithm . . . . . . . . . . . . . . . 72 4.5.2 Partitioning Algorithm . . . . . . . . . . . . . . . . . . . . . . . 78 4.4 4.5 v CONTENTS 4.6 4.7 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 4.6.1 Performance Comparison . . . . . . . . . . . . . . . . . . . . . . 81 4.6.2 Effectiveness of Key Ordering Algorithm . . . . . . . . . . . . . 84 4.6.3 Optimization vs Evaluation time . . . . . . . . . . . . . . . . . . 86 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 Optimal Join Enumeration in MapReduce Framework 87 5.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 5.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89 5.2.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 5.2.2 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92 5.3 Complexity of SOJE Problem . . . . . . . . . . . . . . . . . . . . . . . 92 5.4 Single-Query Join Enumeration Algorithm . . . . . . . . . . . . . . . . . 95 5.4.1 Baseline Join Enumeration Algorithms . . . . . . . . . . . . . . 95 5.4.2 Plan Enumeration Algorithm . . . . . . . . . . . . . . . . . . . . 99 5.4.3 Bottom-up and Top-down Enumerations . . . . . . . . . . . . . . 102 5.5 5.6 5.7 Multi-Query Join Enumeration Algorithm . . . . . . . . . . . . . . . . . 103 5.5.1 First Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104 5.5.2 Second Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 Experimental Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109 5.6.1 Efficiency of Single-Query Join Enumeration Algorithm . . . . . 110 5.6.2 Efficiency of Multi-Query Join Enumeration Algorithm . . . . . . 113 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 vi CONTENTS Conclusion 116 6.1 Contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Bibliography 118 vii SUMMARY Many applications often involve complex multiple queries which share a lot of common subexpressions (CSEs). Identifying and exploiting the CSEs to improve query performance is essential in these applications. Multiple query optimization (MQO), which aims to identify and exploit the CSEs among queries in order to reduce the overall query evaluation cost, has been extensively studied for over two decades and demonstrated to be an effective technique in both RDBMS and MapReduce contexts by existing works. In this thesis, we study the following three novel MQO problems. First, we study the problem of efficient processing of enumerative set-based queries (SQs) in RDBMS. Enumerative SQs aim to find all the sets of entities of interest to meet certain constraints. In this work, we present a novel approach to evaluate enumerative SQs as a collection of cross-product queries (CPQs) and propose efficient and scalable MQO heuristics to optimize the evaluation of a collection of CPQs. Our experimental results demonstrate that our proposed approach is significantly more efficient than conventional RDBMS methods. To the best of our knowledge, that is the first work that addresses the efficient evaluation of a collection of CPQs. Second, we study multi-query/job optimization techniques and algorithms in the MapReduce framework. In this work, we first propose two new multi-job optimization techniques to share map input scan and map output in the MapReduce paradigm. We then propose a new optimization algorithm that, given an input batch of jobs, produces an optimal plan by a judicious partitioning of the jobs into groups and an optimal assignment of the processing technique to each group. Our experimental results on Hadoop demonstrate viii 122 5.3 Limitations and Future Research 123 Bibliography 126 Appendices 139 Appendix A Survey Questionnaire . 139 Appendix B Calculations of Statistical Indicators . 146 B.1 Measurement model indicators 146 B.2 Structural model indicators 146 Appendix C Common Method Variance (CMV) Assessment . 147 C.1 CMV assessment regarding knowledge seeking from close colleagues 147 C.2 CMV assessment regarding knowledge seeking from distant colleagues 148 Appendix D Results of main effects of SNS use, Ability and Motivation on Knowledge Reuse 149 D.1 Knowledge sharing with close colleagues 149 D.2 Knowledge sharing with distant colleagues . 149 D.3 Knowledge seeking from close colleagues . 150 D.4 Knowledge seeking from distant colleagues 150 vi Summary Optimizing knowledge reuse within firms is critical for firms to sustain competitive advantage. However, there exists a problem of how knowledge should be moved from the employees who created the knowledge to those who need the knowledge in an effective and efficient way. As every firm is different, firms should make decisions according to their specific context. This thesis, comprising three studies, seeks to shed some light on how to make decisions for optimizing knowledge reuse within firms. The first study (Chapter 2) explores an integrative framework for understanding knowledge reuse within firms. Although numerous studies have been conducted to understand knowledge reuse and its influencing factors from different perspectives, few are concerned with a holistic picture of organizing these factors and their interactions. This impedes existing findings to be applied effectively in practice. Against this backdrop, the first study proposes an integrative framework. The proposed framework provides a starting point for optimizing knowledge reuse within firms. It also enables researchers to place existing/future studies on the management of knowledge reuse in a holistic picture. The second study (Chapter 3) explores how to develop strategies for optimizing knowledge reuse. Knowledge management strategies are classified as codification and personalization, which imply different costs and benefits for a firm. The optimum strategy usually requires a mix of codification and personalization according to organizational context. However, there are few theories that guide firms on decision-making of the optimum mix. Therefore, the vii second study develops a formal approach by introducing a Markov Decision Process model for knowledge reuse. This approach allows firms to determine optimum mix based on the analysis of benefits and costs in their specific context. The third study (Chapter 4) addresses how firms should deal with emerging technologies that provide alternative tools for implementing knowledge management strategies. At present, social media is such a phenomenon. According to the proposed framework, social media influences knowledge reuse not only through changes in organizational cost of investment, but also through changes in individual behaviors. The third study provides some insights on integrating social media for knowledge reuse purposes by understanding whether and how the use of social media influences knowledge reuse at the individual level. The survey results show that firms should recognize the different needs of employees as knowledge producers and knowledge consumers at different stages of the knowledge reuse process. In addition to the direct investment cost of implementing social media, these individual level concerns must be addressed for successful application. In sum, this thesis contributes to decision-making for optimizing knowledge reuse within firms in three different but related aspects: i) an integrative framework that serves as a starting point for firms to analyze the problem of knowledge reuse; ii) a formal approach for developing the optimum knowledge management strategy; and iii) some insights on integrating emerging technologies (social media in particular) for optimizing knowledge reuse within firms. viii [...]... other hand, there are also many companies reporting that their KM systems have failed (Chua and Lam, 2005) Thus, there is a need to better understand knowledge reuse within firms In general, knowledge reuse involves two types of roles knowledge producers who create and share knowledge with others, and knowledge consumers who seek and reuse the shared knowledge and the transfer of knowledge from knowledge. .. codify knowledge) so that knowledge can be effectively and efficiently reused to reap the maximal value As such, knowledge reuse is adopted as much as possible in this thesis 1.2.3 Knowledge Reuse and Knowledge Sharing /Knowledge Transfer In a broad sense, knowledge reuse, knowledge sharing, and knowledge transfer refer to the same process of knowledge movement, only with different emphasis 6 Studies of knowledge. .. Knowledge Reuse and Knowledge Management Knowledge reuse is defined herein as the totality of knowledge re-applied within an organization over a certain time period (Chai and Nebus, 2012) It is constructed as an organizational level concept that relates closely to economic concerns Knowledge reuse includes individual-level knowledge sharing by knowledge producers, individual-level knowledge seeking and reuse. .. thesis aims to achieve and they are described as follows The first objective is to develop an integrative framework for understanding the problem of knowledge reuse within firms Due to the importance and difficulties of managing knowledge within firms, numerous studies have been conducted to understand this issue and its influencing factors from different perspectives (Wang and Noe, 2010) For example,... any decisions about optimizing knowledge reuse, firms need to understand the problem of knowledge reuse in a comprehensive manner According to Porter (1991, p.98), “A framework can help 7 the analyst to better think through the problem by understanding the firm and its environment and defining and selecting among the strategic alternatives available, no matter what the industry and starting position”... categorizes and reviews the various factors influencing knowledge reuse Section 3 addresses a complete view of knowledge reuse process Section 4 presents the proposed integrative framework Section 5 illustrates how this framework might be applied in practice Section 6 concludes this chapter 15 2.2 Factors Influencing Knowledge Reuse As introduced before, knowledge reuse involves knowledge producers and knowledge. .. organizational reward systems and norms, on knowledge reuse within firms through codification (e.g., electronic repository) and personalization (e.g., interaction networks) (e.g., Bartol and Srivastava, 2002; Haas and Hansen, 2007; Lee and Ahn, 2007) For instance, monetary reward is more effective for knowledge sharing through codification, whereas fairness and merit pay are more crucial in knowledge sharing through... certain piece of knowledge transfer and manages it as a project with a definite start and end However, for the purpose of optimizing knowledge reuse, this model is not suitable because it doesn’t look into the needs of knowledge producers and knowledge consumers along the reuse process Taking a knowledge recipient’s perspective, Markus (2001) divided the process of knowledge reuse through an electronic repository... codification and personalization strategies This model enables firms develop the optimum mix of codification and personalization based on analysis of the benefits and costs for managing knowledge reuse in specific contexts Chapter 4 (Study 3) addresses the third research objective by investigating the relationship between the use of social media and knowledge reuse performance at the individual level and providing... the Use of Social Media and Knowledge Reuse: Implications and Suggestions for Integration Chapter 5 Conclusion Figure 1-1 Overview of the thesis structure 11 Chapter 2 Managing Knowledge Reuse within Firms: An Integrative Framework 2.1 Introduction As mentioned in Chapter 1, firms today compete on a knowledge basis Many strategic management studies have revealed the importance of knowledge for a firm . OPTIMIZING KNOWLEDGE REUSE WITHIN FIRMS: FRAMEWORKS, STRATEGIES AND EMERGING TOOLS LIU HONGMEI (B.M. & M.M. in MIS, Harbin. Study 1 1.2 Working Definitions 4 1.2.1 Knowledge 4 1.2.2 Knowledge Reuse and Knowledge Management 5 1.2.3 Knowledge Reuse and Knowledge Sharing /Knowledge Transfer 6 1.3 Objectives of the. to make decisions for optimizing knowledge reuse within firms. The first study (Chapter 2) explores an integrative framework for understanding knowledge reuse within firms. Although numerous

Ngày đăng: 09/09/2015, 11:24

TỪ KHÓA LIÊN QUAN

TÀI LIỆU CÙNG NGƯỜI DÙNG

TÀI LIỆU LIÊN QUAN