Distributed Database Management Systems: Lecture 34. The main topics covered in this chapter include: query optimization; centralized QO; best access path; join processing; QO in distributed environment; single relation queries: executed according to the best access path;...
Distributed Database Management Systems Lecture 34 In the previous lecture • Concluded Data Localization • Query Optimization – Components: Search space, cost model, search strategy – SS consists of eq query trees – SSts could be static, dynamic or randomized – Cost model sees response and total times… • Query Optimization –Transmission cost is the most important –Another major factor is size of interm tables –Database statistics are used to evaluate size of iterm Tables –Selectivity factor, card, size are some major figures In today’s Lecture • Query Optimization • Centralized QO –Best access path –Join Processing • QO in Distributed Environment • Input to Optimizer is a query tree generated through QD • Single relation queries: executed according to the best access path • Queries involving Joins in three steps 1.Determine the possible ordering of joins2.Determine the cost of each ordering 3.Choose the join ordering with minimal cost • Cost model assigns (estimates) costs to operations based on cardinality of operands • Obtained from Database statistics- • Two major steps in Optimization Algorithm –Best access path for individual relation with pred –The best join ordering; two possibilities 1- Nested loops • for each tuple of external relation (card(n1)) • for each tuple of internal relation (card(n2)) –join two tuples if the join predicate is true • Complexity: n1∗n2 (no index) • Merge join • Sorted relations • Merge relations • Complexity: n1 + n2 if relations are previously sorted • EMP has an index on eNo • ASG has an index on pNo • PROJ has an index on pNo and an index on pName ASG eNo EMP pNo PROJ 1- Choose the best access paths to each relation • EMP: sequential scan (no selection on EMP) • ASG: sequential scan (no selection on ASG) • PROJ: index on pName (there is a selection on PROJ based on pName) 2- Determine the best join ordering –Total 3! orderings are possible –Rather than computing for all, some of them are pruned –Shown in the tree, next page EMP ⋈ ASG ⋈ ASG E EMP A SG EMP x PROJMP ASG OJ PROJ ⋈ ⋈ PROJ A PR SG PROJ x EMP (ASG⋈EMP )⋈PROJ (PROJ MP ⋈ASG)⋈E Join Ordering in Fragmented Queries • It is important • Assumptions• Two alternatives –Join Ordering –Replaced by Semi-Joins • Former is more difficult Join Ordering • Two relations: move the smaller relation to the other site If size(R) < size(S) S R If size(S) < size(R) • More than relations –Calculate all possible costs –Requires to compute size of intermediate tables –Difficult! Lets see why –EMP ⋈ ASG ⋈ PROJ Site ASG eNo EMP Site pNo PROJ Site • Strategy 1: EMPsite2, site2 computes EMP’= EMP ⋈ ASGsite3 computes EMP’ ⋈ PROJ Site ASG eNo EMP Site pNo PROJ Site • Strategy 2: ASGsite1, site1 computes EMP’= EMP ⋈ ASGsite3 computes EMP’ ⋈ PROJ Site ASG eNo EMP Site pNo PROJ Site • Strategy 3: ASGsite3, site3 computes ASG’= PROJ ⋈ ASGsite1 computes EMP ⋈ ASG’ Site ASG eNo EMP Site pNo PROJ Site • Strategy 4: PROJsite2, site2 computes PROJ’= PROJ ⋈ ASGsite1 computes EMP ⋈ PROJ’ Site ASG eNo EMP Site pNo PROJ Site • Strategy 5: EMP, PROJsite2, site2 computes PROJ ⋈ ASG ⋈ EMP Which one to Choose • We need to know –Size of operand tables –Estimate interm tables’ size • Computing all possibilities could be lengthy • Heuristic: Consider only the size of tables- Thanks ... factor is size of interm tables ? ?Database statistics are used to evaluate size of iterm Tables –Selectivity factor, card, size are some major figures In today’s Lecture • Query Optimization • Centralized...In the previous lecture • Concluded Data Localization • Query Optimization – Components: Search space, cost model,... today’s Lecture • Query Optimization • Centralized QO –Best access path –Join Processing • QO in Distributed Environment • Input to Optimizer is a query tree generated through QD • Single relation