Distributed Database Management Systems: Lecture 33. The main topics covered in this chapter include: data localization for hybrid fragmentation; query optimization; HyF contains both types of fragmentations; QO refers to producing a query execution plan (QEP) that represents execution strategy;...
Distributed Database Management Systems Lecture 33 In the previous lecture • Final phase of QD • Data Localization: for HF, VF and DF In today’s Lecture • Data Localization for Hybrid Fragmentation • Query Optimization Reduction for HyF • HyF contains both types of Fragmentations • EMP1= eNo ≤ E4 ( eNo, eName (EMP)) • EMP2= eNo > E4 ( eNo, eName (EMP)) • EMP3= eNo, title (EMP) • Select eName from EMP where eNo = E5 eName eName eNo = E5 ⋈ eNo = E5 e U EMP1 No EMP2 EMP2 EMP3 Reduced Query- Summary of what we have done so far • Query Decomposition: generates an efficient query in relational algebra – Normalization, Analysis, Simplification, Rewriting • Data Localization: applies global query to fragments; increases optimization level- • So, next is the cost-based optimization • Mainly concentrates on the order of performing joins • Characteristics of relations like cardinalities are considered • First QO in general • QO refers to producing a Query Execution plan (QEP) that represents execution strategy • Components of Optimizer • Search Space: set of eq alternative exec plans • Cost Model: predicts cost of a execution plan • Search Strategy: produces best plan • TCPU = time for a CPU inst • TI/O = a disk I/O • TMSG = fixed time for initiating and recv a msg • TTR = transmit a data unit from one site to another Site X units Site Site Y units • TT = 2TMSG + TTR*(x+y) • RT = max{TMSG + TTR*X, TMSG + TTR*Y} Database Statistics • Major factor is interm tabs • If the interm results are to be transmitted, then estimation about size is a must • More precise statistics cost more • For each relation R[A1, A2, …, An] fragmented as R1, …, Rr 1.length of each attribute: length(Ai) 2.the number of distinct values for each attribute in each fragment: card( Ai(Rj)) 3.maximum and minimum values in the domain of each attribute: min(Ai), max(Ai) 4.The cardinalities of each domain: card(dom[Ai]) and the cardinalities of each fragment: card(Rj) 5.Join selectivity factor for some of the relations SFJ (R,S) = card(R ⋈ S)/ (card(R) ∗ card(S))- Cardinalities of Intermediate Results Selection Operation • Card( F(R))=SFS(F) * card(R) • SFS(A = value) = 1/card( A (R)) • SFS(A > value) = max(A) – value /(max(A) – min(A)) • SFS(A < value) = value - min(A) /(max(A) – min(A)) • SFS(A < value) = max(A) – value /(max(A) – min(A)) • SFS(p(Ai) ^ p(Aj)) = SFS(p(Ai)) * (SFSp(Aj)) • SFS(p(Ai) v p(Aj)) = SFS(p(Ai)) + SFS(p(Aj))–(SFS(p(Ai))* SFS(p(Ai))) Cardinality of Projection • Hard to determine precisely • Two cases when it is trivial 1- When a single attribute A, card( A(R)) = card (A) 2- When PK is included card( A (R)) = card (R) Cartesian Product • card(RxS) = card (R) * card(S) • Cardinality of Join • No general way to test without additional information • In case of PK/FK combination Card(R ⋈ S) = card (S) • Semi Join: SFSJ(R ⋉AS)= card( (S))/ A card(dom[A]) card(R ⋉AS) = SFSJ(S.A) * card(R) • Union: Hard to estimate • Limits possible which are card(R) + card(S) and max{card (R) + card (S)) • Difference: Like Union, card (R) for (R-S), and Centralized Query Optimization Why to Study 1.Distributed Query is transformed into local ones 2.Issues are related and more complex in DD 3.Easier to understand • Two famous ones • INGRES – Dynamic – Recursively breaks into smaller ones • System R –static –exhaustive search ...In the previous lecture • Final phase of QD • Data Localization: for HF, VF and DF In today’s Lecture • Data Localization for Hybrid Fragmentation • Query... Site X units Site Site Y units • TT = 2TMSG + TTR*(x+y) • RT = max{TMSG + TTR*X, TMSG + TTR*Y} Database Statistics • Major factor is interm tabs • If the interm results are to be transmitted,... • Difference: Like Union, card (R) for (R-S), and Centralized Query Optimization Why to Study 1 .Distributed Query is transformed into local ones 2.Issues are related and more complex in DD 3.Easier