Distributed Database Management Systems: Lecture 33

THÔNG TIN TÀI LIỆU

Thông tin cơ bản

Tiêu đề	Data Localization for Hybrid Fragmentation and Query Optimization
Trường học	Standard University
Chuyên ngành	Distributed Database Management Systems
Thể loại	Lecture
Năm xuất bản	2023
Thành phố	Standard City

Định dạng
Số trang	36
Dung lượng	93,57 KB

Nội dung

Distributed Database Management Systems: Lecture 33. The main topics covered in this chapter include: data localization for hybrid fragmentation; query optimization; HyF contains both types of fragmentations; QO refers to producing a query execution plan (QEP) that represents execution strategy;...

Distributed Database Management Systems Lecture 33 In the previous lecture • Final phase of QD • Data Localization: for HF, VF and DF In today’s Lecture • Data Localization for Hybrid Fragmentation • Query Optimization Reduction for HyF • HyF contains both types of Fragmentations • EMP1= eNo ≤ E4 ( eNo, eName (EMP)) • EMP2= eNo > E4 ( eNo, eName (EMP)) • EMP3= eNo, title (EMP) • Select eName from EMP where eNo = E5 eName eName eNo = E5 ⋈ eNo = E5 e U EMP1 No EMP2 EMP2 EMP3 Reduced Query- Summary of what we have done so far • Query Decomposition: generates an efficient query in relational algebra – Normalization, Analysis, Simplification, Rewriting • Data Localization: applies global query to fragments; increases optimization level- • So, next is the cost-based optimization • Mainly concentrates on the order of performing joins • Characteristics of relations like cardinalities are considered • First QO in general • QO refers to producing a Query Execution plan (QEP) that represents execution strategy • Components of Optimizer • Search Space: set of eq alternative exec plans • Cost Model: predicts cost of a execution plan • Search Strategy: produces best plan • TCPU = time for a CPU inst • TI/O = a disk I/O • TMSG = fixed time for initiating and recv a msg • TTR = transmit a data unit from one site to another Site X units Site Site Y units • TT = 2TMSG + TTR*(x+y) • RT = max{TMSG + TTR*X, TMSG + TTR*Y} Database Statistics • Major factor is interm tabs • If the interm results are to be transmitted, then estimation about size is a must • More precise statistics cost more • For each relation R[A1, A2, …, An] fragmented as R1, …, Rr 1.length of each attribute: length(Ai) 2.the number of distinct values for each attribute in each fragment: card( Ai(Rj)) 3.maximum and minimum values in the domain of each attribute: min(Ai), max(Ai) 4.The cardinalities of each domain: card(dom[Ai]) and the cardinalities of each fragment: card(Rj) 5.Join selectivity factor for some of the relations SFJ (R,S) = card(R ⋈ S)/ (card(R) ∗ card(S))- Cardinalities of Intermediate Results Selection Operation • Card( F(R))=SFS(F) * card(R) • SFS(A = value) = 1/card( A (R)) • SFS(A > value) = max(A) – value /(max(A) – min(A)) • SFS(A < value) = value - min(A) /(max(A) – min(A)) • SFS(A < value) = max(A) – value /(max(A) – min(A)) • SFS(p(Ai) ^ p(Aj)) = SFS(p(Ai)) * (SFSp(Aj)) • SFS(p(Ai) v p(Aj)) = SFS(p(Ai)) + SFS(p(Aj))–(SFS(p(Ai))* SFS(p(Ai))) Cardinality of Projection • Hard to determine precisely • Two cases when it is trivial 1- When a single attribute A, card( A(R)) = card (A) 2- When PK is included card( A (R)) = card (R) Cartesian Product • card(RxS) = card (R) * card(S) • Cardinality of Join • No general way to test without additional information • In case of PK/FK combination Card(R ⋈ S) = card (S) • Semi Join: SFSJ(R ⋉AS)= card( (S))/ A card(dom[A]) card(R ⋉AS) = SFSJ(S.A) * card(R) • Union: Hard to estimate • Limits possible which are card(R) + card(S) and max{card (R) + card (S)) • Difference: Like Union, card (R) for (R-S), and Centralized Query Optimization Why to Study 1.Distributed Query is transformed into local ones 2.Issues are related and more complex in DD 3.Easier to understand • Two famous ones • INGRES – Dynamic – Recursively breaks into smaller ones • System R –static –exhaustive search ...In the previous lecture • Final phase of QD • Data Localization: for HF, VF and DF In today’s Lecture • Data Localization for Hybrid Fragmentation • Query... Site X units Site Site Y units • TT = 2TMSG + TTR*(x+y) • RT = max{TMSG + TTR*X, TMSG + TTR*Y} Database Statistics • Major factor is interm tabs • If the interm results are to be transmitted,... • Difference: Like Union, card (R) for (R-S), and Centralized Query Optimization Why to Study 1 .Distributed Query is transformed into local ones 2.Issues are related and more complex in DD 3.Easier

Ngày đăng: 05/07/2022, 13:41