Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 32 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
32
Dung lượng
1,02 MB
Nội dung
Point Access Method Nhóm : Lâm Tuấn Anh Nguyễn Đình Tân Anh Lê Minh Châu Point Access Method Spatial Data Main Memory Structure Point Access Methods Spatial Data Characteristic of Spatial Data • • • • • • • Complex Structure Dynamic Spatial databases tend to be large There is no standard algebra defined on spatial data Many spatial operators are not closed Spatial database operators more expensive than standard relational operators There is no total order among spatial object Queries in Spatial Data Queries in Spatial Data • Exact Match Query ( EMQ ) • Point Query (PQ ) • • • • • • Condition : Given object o’ with spatial extent o’.G in Euclide with d-dimension Target : Find all objects o with same spatial extent as o’ Query Condition : Given a point p in Euclide with d-dimension Target : Find all objects o ovelapping with p Query Queries in Spatial Data • Enclosure Query ( EQ ) • • • Condition : Given object o’ with spatial extent o’.G in Euclide with d-dimension Target : Find all objects o enclosing o’ Query Queries in Spatial Data • Spatial Join • • • • Condition : Given two collections R and S of spatial objects and a spatial predicate θ Target : find all pairs of objects (o, o’) belongs to RxS where θ(o.G, o’.G) evaluates to true Query Spatial Data • Requirements for Multidimensional Access Methods • • • • • • • • • Dynamics Secondary/tertiary storage management Broad range of supported operations Independence of the input data and insertion sequence Simplicity Scalability Time efficiency Space efficiency Concurrency and recovery Main Memory Structure i i i i th th th th point: pi polygon: ri centroid: ci minimum bounding box: mi Figure Running example EXCELL • • decomposes the universe regularly: all grid cells are of equal size each new split results in the halving of all cells and therefore in the doubling of the directory size The Two-Level Grid File • • • • • • • • Use a second grid file to manage the grid directory The first of the two levels is called the root directory, Second level: the actual grid directory root directory contain pointers to the directory pages of the lower level, which in turn contain pointers to the data pages Splits are often confined to the subdirectory regions without affecting too much the surroundings =>slower directory growth not solve the problem of super linear directory size The Twin Grid File • • • • increase space utilization by introducing a second grid file relationship between these two grid files is not hierarchical but somewhat more balanced Both grid files span the whole universe The distribution of the data among the two files is performed dynamically Hierarchical Access Method • • • • • • Based on binary of multi-way tree structure like hashing, stores data in bucket each bucket is leaf of a node, and a disk page interior nodes of the tree guide search search: top-down tree traversal difference between different methods: characteristics of the regions Hierarchical Access Method • k-d-B-tree • • • • • • • • • combination of adaptive k-d-tree and B-tree partition the universe like adaptive k-d associates subspaces to tree nodes interior nodes are intervals nodes in same level are mutually disjoint perfectly balanced (like B-tree) search straightforward, like k-d-tree insert: search, find the right bucket, if required split and move half the data to it Deletion: search, remove, if necessary merge node with siblings Hierarchical Access Method • k-d-B-tree • • • • • • • • • combination of adaptive k-d-tree and B-tree partition the universe like adaptive k-d associates subspaces to tree nodes interior nodes are intervals nodes in same level are mutually disjoint perfectly balanced (like B-tree) search straightforward, like k-d-tree insert: search, find the right bucket, if required split and move half the data to it Deletion: search, remove, if necessary merge node with siblings Hierarchical Access Method • LSD tree • • • • directory is organized same as adaptive k-d-tree better adaptation to data distribution (in compare to fixed binary partitioning) external balancing property: heights of external subtrees differ at most by one combines two split strategies to accommodate skewed data: • data-dependent : based on data, tries to achieve most balanced structure (equal number of data in both sides of split) • distribution-dependent: split at fixed dimension and position (know distribution is assumed) Hierarchical Access Method Hierarchical Access Method • Buddy tree: • • • • • • • dynamic hashing scheme with tree structure (hybrid) tree is made by consecutive insertions cut the universe equally with iso-oriented hyperplanes interior nodes: a partition and an interval (MBB of points or intervals below node) intervals in same level nodes are mutually disjoint leaves are data (like other trees!!) each directory node has at least two entries => may not be balanced • when a node splits, MBB of two intervals are computed to reflect the current situation => tries to achieve high selectivity at directory level • except for root, only one pointer refers to each directory page => guarantees linear growth Hierarchical Access Method Hierarchical Access Method • BANG file (Balanced and Nested Grid) • • • • a hybrid method divides the univers to intervals (boxes), similar to grid difference: buckets regions may intersect can form nonrectangular bucket regions by taking geometric difference of two intervals (nesting) • • increased storage utilization: redistributes data between bucket during insertion balanced search tree to manage directory Hierarchical Access Method • BANG file (Balanced and Nested Grid) • • • first rectangles: R1: R2, R5, R6 • a point search may require traversal of entire directory in depth-first manner then R3 and R4 in R2 and R5 representation as bit interleaving * = universe • Hierarchical Access Method • hB-tree • • • utilizes k-d-B-tree to organize the space represented by interior nodes difference in splitting: based on multiple attributes region not boxed shape Hierarchical Access Method • BV-tree • • • • tries to solve d-dimensional B-tree idea: maintain major strengths of Btree, by relaxing balancing and space utilization BV-tree not balanced at least 33% space utilization (50% for B-tree) Question • What form of Point Query in Spatial Data ? • What methods belong to Multidimensional Hashing Method? • • • • Grid File K-d Tree Linear Hashing EXCELL [...]... Designed for main memory applications where all the data are available without accessing the disk Do not take secondary storage management into account explicitly In many spatial database applications the amount of data to be managed is notoriously large Point Access Methods • Multidimensional Hashing • Hierarchical Access Method Multidimensional Hashing • • • • No total order for objects in two- and... reflect the current situation => tries to achieve high selectivity at directory level • except for root, only one pointer refers to each directory page => guarantees linear growth Hierarchical Access Method Hierarchical Access Method • BANG file (Balanced and Nested Grid) • • • • a hybrid method divides the univers to intervals (boxes), similar to grid difference: buckets regions may intersect can form... dimension and position (know distribution is assumed) Hierarchical Access Method Hierarchical Access Method • Buddy tree: • • • • • • • dynamic hashing scheme with tree structure (hybrid) tree is made by consecutive insertions cut the universe equally with iso-oriented hyperplanes interior nodes: a partition and an interval (MBB of points or intervals below node) intervals in same level nodes are mutually... files is performed dynamically Hierarchical Access Method • • • • • • Based on binary of multi-way tree structure like hashing, stores data in bucket each bucket is leaf of a node, and a disk page interior nodes of the tree guide search search: top-down tree traversal difference between different methods: characteristics of the regions Hierarchical Access Method • k-d-B-tree • • • • • • • • • combination... insertion balanced search tree to manage directory Hierarchical Access Method • BANG file (Balanced and Nested Grid) • • • first 3 rectangles: R1: R2, R5, R6 • a point search may require traversal of entire directory in depth-first manner then R3 and R4 in R2 and R5 representation as bit interleaving * = universe • Hierarchical Access Method • hB-tree • • • utilizes k-d-B-tree to organize the space... attributes region not boxed shape Hierarchical Access Method • BV-tree • • • • tries to solve d-dimensional B-tree idea: maintain major strengths of Btree, by relaxing balancing and space utilization BV-tree not balanced at least 33% space utilization (50% for B-tree) Question • What form of Point Query in Spatial Data ? • What methods belong to Multidimensional Hashing Method? • • • • Grid File K-d Tree Linear...Main Memory Structure i i i i th th th th point: pi polygon: ri centroid: ci minimum bounding box: mi Figure 9 Running example Main Memory Structure i i i i th th th th point: pi polygon: ri centroid: ci minimum bounding box: mi Figure 10 k-d construction Main Memory Structure i i i i th th th th point: pi polygon: ri centroid: ci minimum bounding box: mi Figure 11 k-d... search straightforward, like k-d-tree insert: search, find the right bucket, if required split and move half the data to it Deletion: search, remove, if necessary merge node with siblings Hierarchical Access Method • k-d-B-tree • • • • • • • • • combination of adaptive k-d-tree and B-tree partition the universe like adaptive k-d associates subspaces to tree nodes interior nodes are intervals nodes in same... search straightforward, like k-d-tree insert: search, find the right bucket, if required split and move half the data to it Deletion: search, remove, if necessary merge node with siblings Hierarchical Access Method • LSD tree • • • • directory is organized same as adaptive k-d-tree better adaptation to data distribution (in compare to fixed binary partitioning) external balancing property: heights of external... second grid file to manage the grid directory The first of the two levels is called the root directory, Second level: the actual grid directory root directory contain pointers to the directory pages of the lower level, which in turn contain pointers to the data pages Splits are often confined to the subdirectory regions without affecting too much the surroundings =>slower directory growth not solve the .. .Point Access Method Spatial Data Main Memory Structure Point Access Methods Spatial Data Characteristic of Spatial Data • • • • •... only one pointer refers to each directory page => guarantees linear growth Hierarchical Access Method Hierarchical Access Method • BANG file (Balanced and Nested Grid) • • • • a hybrid method divides... split at fixed dimension and position (know distribution is assumed) Hierarchical Access Method Hierarchical Access Method • Buddy tree: • • • • • • • dynamic hashing scheme with tree structure (hybrid)