Keyword Search in Databases- P13 doc

3.3. STEINER TREE-BASED KEYWORD SEARCH 59 e 1 e 2 T 2 T 1 k 1 k 2 k 4 k 3 Figure 3.5: Serialize The theorem also specifies the delay in terms of the running time of Q-subtree (). Recall that n and m are the number of nodes and edges of G D respectively, l is the number of keywords, and n i is the number of nodes in the i-th output tree. Note that there are at most 2 m trees, i.e., i ≤ 2 m . Theorem 3.5 [Kimelfeld and Sagiv, 2006b] Consider a data graph G D and a query Q ={k 1 , ··· ,k l }. • EnumTreePD enumerates all the Q-subtreesofG D in the rank order if Q-subtree () returns an optimal t ree. •If Q-subtree () returns a θ -approximation of optimal tree, then EnumTreePD enumerates in a θ-approximate ranked order. •If Q-subtree () terminates in time t (n, m, l), then EnumTreePD outputs the (i + 1)-th answer with delay O(n i (t (n, m, l) + log(n · i) + n i )). The task of enumerating Q-subtrees is transformed into finding an optimal Q-subtree under a set of constraints, which are specified as inclusion edges I and exclusion edges E. The constraints specified by exclusion edges can be handled easily by removing those edges in E from the data graph G D . So,in the following,we only consider the inclusion edges I, recall that it is the set of edges that each answer in the subspace should contain. A partial tree (PT) is any directed subtree of G D . A set of PTs P is called a set of PT constraints if the PTs in P are pairwise node-disjoint. The set of leaves in the PTs of P is denoted as leaves(P). Proposition 3.6 [Kimelfeld and Sagiv, 2006b] The algorithm Q-subtree () can be executed efficiently so that, for every generated set of inclusion edges I, the subgraph of G D induced by I forms a set of PT constrains P, such that leaves(P) ⊆{k 1 , ··· ,k l } and P has at most two PTs. Serialize function at line 7 of EnumTreePD is used to order the set of edges, such that the newly generated inclusion edges satisfy the above proposition, i.e., | P|≤2. The general idea 60 3. GRAPH-BASED KEYWORD SEARCH Algorithm 20 SuperTree (G, T ) Input: a data graph G, and a set of PT constraints T . Output: a minimum weight super tree of T in G. 1: G  ← collapse(G, T ) 2: R ←{root(T )|T ∈ T } 3: T  ← SteinerTree (G  , R) 4: if T  =⊥ then 5: return restore(G,T  , T ) 6: else 7: return ⊥ of Serialize is shown in Figure 3.5. Assume the tree in Figure 3.5 is T that was obtained in line 6of EnumTreePD. We regard the problem as recursively adding edges from E(T )\I into P. We discuss two different cases: | P|=1 and |P|=2.If|P|=1, i.e., P ={T 1 }, then there are two choices, either adding the incoming edge to the root of T 1 , e.g., edge e 1 , or adding the incoming edge to a keyword node that is not in V(T 1 ), e.g. the incoming edge to k 2 or k 4 . In the other case, P ={T 1 ,T 2 }, there are also two choices: either adding the incoming edge to the root of T 1 , e.g., edge e 1 , or adding the incoming edge to the root of T 2 , e.g., edge e 2 , and, eventually, T 1 and T 2 will be merged into one tree. In P, there are two types of PTs. A reduced PT (RPT) has a root with at least two children, whereas a nonreduced PT (NPT) has a root with only one child. As a special case, a single node is considered as an RPT. Without loss of generality, it can add to P every keyword node not appearing in leaves( P) as a single node RPT with that keyword node. Thus, from now on, we assume that leaves( P) ={k 1 , ··· ,k l }, and there can be more than two PTs, but P can have at most two NPTs and also at most two non-single node PTs. We denote P |RPT and P |NPT as the set of all the RPTs and the set of all the NPTs of P, respectively. In thefollowing,we discuss different implementationsof Q-subtree (G D ,Q,I,E).Wefirst create another graph G by removing those edges in E from G D , and I forms a set of PT constraints as described above. So we assume that the inputs of the algorithm are a data graph G and a set of PT constraints P where leaves(P) = Q. Finding Minimum Weight Super Tree of P: We first introduce a procedure to find a minimum weight super tree of P, i.e., a tree T that contains P as subtrees. Sometimes, the found super tree is also an optimal Q-subtree, but it may not be reduced. For example, for the two PTs, T 1 and T 2 in the upper left part of Fig. 3.6, the tree with T 1 and T 2 and the edge v 2 ,v 5  is a minimum weight super tree, but it is not reduced, so it is not a Q-subtree. Algorithm 20 ( SuperTree [Kimelfeld and Sagiv, 2006b]) finds the optimal super tree of T if it exists. It reduces the problem to a steiner tree problem by collapsing graph G according to T . 3.3. STEINER TREE-BASED KEYWORD SEARCH 61 ABCD ABCD Output 3.restore() 1.collapse() 2.ReducedSubtree() G  D v 0 v 9 v 1 v 5 v 7 T  v 1 v 5 T 1 v 1 v 2 v 3 v 4 v 6 v 8 v 5 T 2 v 7 v 8 v 6 T 2 v 5 v 4 v 3 v 2 T 1 v 1 v 9 G D v 0 Figure 3.6: Execution example of finding supertree [Kimelfeld and Sagiv, 2006b] The graph collapse(G, T ) is the result of collapsing all the subtrees in T , and it can be obtained as follows. • Delete all the edges u, v, where v is a non-root node of a PT T ∈ T and u, v is not an edge of T . • For the remaining edges u, v, such that u is a non-root node of a PT T ∈ T and u, v is not an edge of T , add an edge root (T ), v.The weight of the edges root (T ), v is the minimum among the weights of all such edges (including the original edges in G). • Delete all the non-root nodes of PTs of T and their associated edges. As an example, the top part of Figure 3.6 shows how two node-disjoint subtrees T 1 and T 2 are collapsed.In this figure,the edge weights are not shown, and they are assumed equal.In the collapsed graph G  , it needs to find a minimum directed steiner tree to contain all the root nodes of the PTs in T (line 3), this step can be accomplished by existing algorithms. Next, it needs to restore T  to be a super tree of T in G. First, it adds back all the edges of each PT T ∈ T to T  . Then, it replaces each edge in T  with the original edge from which the collapse step gets (it can be the edge itself). Figure 3.6 shows the execution of SuperTree for the input consistingof G and T ={T 1 ,T 2 }. In the firststep,G  is obtained fromG by collapsingthesubtreesT 1 and T 2 .Thesecond step constructs 62 3. GRAPH-BASED KEYWORD SEARCH T 11 T 12 T 2 T 1 r Figure 3.7: The high-level structure of a reduced minimum steiner tree a minimum directed steiner tree T  of G  with respect to the set of roots {v 1 ,v 5 }. Finally, T 1 and T 2 are restored in T  and the result is returned. Theorem 3.7 [Kimelfeld and Sagiv, 2006b] Consider a data graph G D and a set T of PT constraints. Let n and m be the number of nodes and edges of G D respectively, and let t be number of PTs in T . • MinWeightSuperTree, in which the SteinerTree is implemented by DPBF, returns a minimum weight super tree of T if one exists, or ⊥ otherwise. The running time is O(3 t n + 2 t ((l + n) log n + m)). • AppWeightSuperTree, in which the SteinerTree is implemented by a θ (n, m, t)- approximation algorithm with running time f (n, m, t), returns a θ (n, m, t)-approximate minimum weight supertree of T if oneexists,or ⊥ otherwise.The running time isO(m · t + f (n, m, t)). Finding minimum weight Q-subtree under P: The minimum weight super tree of P returned by MinWeightSuperTree is sometimes a Q-subtree, but it is not reduced other times. This situation is caused by the fact that some PTs in P are NPTs, and the root of one of these NPTs becomes the root of the tree returned by MinWeightSuperTree. So, if it can find the true root of the minimum weight Q-subtree, then it can find the answer by MinWeightSuperTree.Now let’s analyze a general minimum weight Q-subtree as shown in Figure 3.7, where T 1 , ··· ,T 11 , ··· are PTs of P, solid arrows denote paths in a PT,and a dotted arrow denotes a path with no node from P except the start and end nodes.Node r is the root node, and it can be a root node from P. For each PT T ∈ P, there can be at most one incoming edge to root(T ) and no incoming edges to non-root 3.3. STEINER TREE-BASED KEYWORD SEARCH 63 nodes of T .Letlevel for every T ∈ P be level(T ), which is the number of different PTs on the path from root to this PT. For example, level(T 1 ) = level(T 2 ) = 0 and level(T 11 ) = level(T 12 ) = 1. We only care about the PTs at level 0, which we call top-level PTs, and denoted them as T top . First, assume | T top |≥2. We use T top to denote the subtree consisting of all the paths from r to the root node of PTs in T top and their associated nodes. We denote the union of T top and T top as T + top , i.e., T + top = T top ∪ T top .The case, |T top |=1, is implicitly captured by the cases of |T top |≥2. Note that, T top may not be reduced, i.e., the root may have only one child, but T + top will be a reduced tree. The algorithm to find a minimum weight Q-subtree under PT constraints P consists of three steps. First, we assume that, the set of top-level PTs, T top is found. 1. Find a minimum weight super tree T top in G 1 with the set of root nodes in T top as the terminal nodes, where G 1 is obtained from G by deleting all the nodes in P except those root nodes in T top . It is easy to verify that, T top can be found this way. 2. Union T top and T top to get the expanded tree T + top . 3. Find a minimum weight super tree of P\T top ∪{T + top } from G 2 , where G 2 is obtained by deleting all the incoming edges to root(T + top ). This step is to ensure that root (T + top ) will be the root of the final tree. The above steps can find a minimum weight Q-subtree under constraints P, given T top . Usually, it is not easy to find T top . However, it can resort to an exponential time algorithm that enumerates all the subsets of P and finds an optimal Q-subtree with each of the subsets as T top .The tree with minimum weight will be the final Q-subtree under constraints P. The detailed algorithm is shown in Algorithm 21 ( MinWeightQSub- tree [Kimelfeld and Sagiv, 2006b]). It handles two special cases in lines 1-2 where P contains only one PT.The non-root nodes of NPTs in P are removed to avoid finding a non-reduced tree (line 3). Then it enumerates all the possible the top-level PTs (line 5). For each possible top-level PTs, T top , it first finds T top by calling MinWeightSuperTree (line 7), then gets T + top (line 9), and finds a minimum weight Q-subtree with root(T + top ) as the root (lines 10-11). Note that, data graph G is not necessarily generated as G A D for a keyword search problem; MinWeightQSubtree works for any general directed graph, i.e., the terminal nodes can also have outgoing edges. Theorem 3.8 [Kimelfeld and Sagiv, 2006b] Consider a data g raph G with n nodes and m edges. Let Q ={k 1 , ··· ,k l } be a keyword query and P be a set of p PT constraints, such that leaves(P) = Q. MinWeightQSubtree returns either a minimum weight Q-subtree containing P if one exists, or ⊥ otherwise. The running time of MinWeightQSubtree is O(4 p + 3 p ((l + log n)n + m)). Finding (θ + 1)-approximate minimumweight Q-subtree under P: In this part, we assume that AppWeightSuperTree can find a θ-approximation of minimum steiner tree in polynomial time f (n, m, t). Then, MinWeightQSubtree can be modified to find a θ-approximation of the minimum weight Q-subtree, by replacing MinWeightSuperTree with AppWeightSuperTree. . two choices, either adding the incoming edge to the root of T 1 , e.g., edge e 1 , or adding the incoming edge to a keyword node that is not in V(T 1 ), e.g. the incoming edge to k 2 or k 4 . In the other. adding the incoming edge to the root of T 1 , e.g., edge e 1 , or adding the incoming edge to the root of T 2 , e.g., edge e 2 , and, eventually, T 1 and T 2 will be merged into one tree. In P,. Serialize is shown in Figure 3.5. Assume the tree in Figure 3.5 is T that was obtained in line 6of EnumTreePD. We regard the problem as recursively adding edges from E(T )I into P. We discuss

Định dạng
Số trang	5
Dung lượng	132,88 KB