Keyword Search in Databases- P27 docx

5.3. VARIATIONS OF KEYWORD SEARCH ON DATABASES 129 (maybe empty). For example, forthekeyword query Q ={author, number, paper, XML}, one of the possible CIsis(C ={author.TID, paper.TID, paper.title contains “XML” },a = paper.TID,F = count,w = author.TID).ACI is trivial if one of the following is satisfied: (1) C contains a attribute c = a such that c functionally determines a, or (2) C contains two attributes c i and c j that refer to the same attribute or c i is a foreign key of c j .The set of non-trivial CIs can be enumerated by using the full text index enabled in rdbms. After enumerating all non-trivial CIs, for each CI = (C,a,F,w), it enumerates a set of Simple Query Networks (SQN) where each SQN is a connected subgraph of the schema graph that satisfies the following conditions: • Total - All tables in C are contained in the SQN. • Minimal - It is not total if any node is removed from SQN. • Node Clarity - Each node in SQN has at most one incoming edge. Suppose the cost of a SQN is the summation of all edge costs and node costs. For each CI,it needs to get the SQN with the smallest cost, which is a NP-Complete problem. A heuristic greedy algorithm is proposed in SQAK . For a CI, (C,a,F,w), it starts at the table o that contains the attribute a. For each of the other tables (nodes) v ∈ C, it finds the shortest path from v to o in a backtrack manner. If, after adding the path from v to o, the node clarity condition is violated, it backtracks to find the next shortest path from v to o until all nodes in C are successfully added. It then outputs the current result to be a good SQN for the CI. After finding the SQN for each CI, it can get the top-kSQNs with the smallest cost. And each of the top-kSQNs is translated into an sql to be output. 5.3.5 SMALL DATABASE AS RESULT Précis [Koutrika et al., 2006; Simitsis et al., 2008] returns a small database that contains only the tuples relevant to a given keyword query Q. The schema of a relational database D is modeled as a weighted graph G S (V , E), where each relation is modeled as a node in G S , and each foreign key reference between relations is modeled as an edge in G S . Each edge also has a weight, defining the tightness of the relationship between the two relations. Given a keyword query Q ={k 1 ,k 2 , , k l }, the result of applying Q on D is a small database D  , satisfying the following conditions. 1. The set of relation names in D  is a subset of the set of relation names in D. 2. For each relation R  ∈ D  that corresponds to relation R ∈ D, we have att(R  ) ⊆ att(R) and tup(R  ) ⊆ tup(R), where att(R) denotes the attributes of R and tup(R) denotes the tuples of R. 3. The tuples in D  can be generated by expanding from the tuples that contain keywords in the query, following the foreign key references. They must satisfy the degree constraints and 130 5. OTHER TOPICS FOR KEYWORD SEARCH ON DATABASES cardinality constraints. Degree constraints define the attributes and relations in D  . They include (1)the maximum numberof attributesin D  ,and(2) theminimum weight ofprojection paths in the database schema graph G S . Cardinality constraints define the set of tuples in D  . They include (1) the maximum number of tuples in D  , and (2) the maximum number of tuples for each relation in D  . For example, for the DBLP database shown in Figure 2.2, consider a keyword query Q = {algorithms}, with the constraint such that the distance from any tuple to the tuple that contains the keyword in Q must be no larger than 2. Then, the result contains the database having the same schema with the original database.Tuples such as p 2 and p 3 will be contained in the result because they all have distance 2 with the tuple p 4 that contains the keyword “algorithms”.Tuples such as a 1 , a 2 and p 4 will not be contained in the result because they all have distance larger than 2 with any tuple that contains the keyword “algorithms”. In Précis, a keyword query is processed in two steps. In the first step, the schema of the database D  is generated, such that all of the degree constraints are satisfied.This can be done easily by expanding from the relations, that may contain the user given keywords,to the adjacent relations following the foreign key references, until all degree constraints are satisfied. In the second step, it evaluates each join edge defined in the schema of D  in order to satisfy all the cardinality constraints. 5.3.6 OTHER RELATED ISSUES Jagadish et al. [2007] assert that usability of a database is an important issue to address in database research. Enabling keyword query on database is one aspect to improve the usability. Goldman et al. [1998] propose the notion, proximity search, which is to search objects in database that are “near” other relevant objects. Here the database is represented as a graph, where objects are represented by nodes and edgesrepresent relationships betweenthe corresponding objects. Su and Widom [2005] propose to construct virtual documents offline, which is the answer unit for a keyword query. Virtual documents are interconnected tuples from multiple relations. Query answering is in an traditional IR style, where virtual documents satisfying the query are returned. Nandi and Jagadish [2009] propose to represent the database, conceptually, as a collection of independent “queried units”, each of which represents the desired result of some query against the database. Jayapandian and Jagadish [2008] present an automated technique to generate a good set of forms that can express all possible queries, and each form is capable of expressing only a very limited range of queries. Talukdar et al. [2008] present a system with which a non-expert user can author new query templates and Web forms, to be used by anyone with related information needs. The query templates and Web forms are generated by a keyword query against interlinked source relations. Ji et al. [2009] study interactive keyword search on RDB, where the interaction is provided by autocompletion, which predicts a word of phrase that a user may type based on the partial query the user has entered. An answer defined in [Ji et al., 2009] is a single record in RDB. Li et al. [2009a] extend the autocompletion framework to the steiner tree based semantics for a keyword query. 5.3. VARIATIONS OF KEYWORD SEARCH ON DATABASES 131 Chaudhuri and Kaushik [2009] study autocompletion with tolerated errors in a general framework, in which only autocompletions are computed without query evaluation. [Pu and Yu, 2008, 2009] study the problem of query cleaning for keyword queries in RDB, where query cleaning involves semantic linkage and spelling corrections followed by segmenting nearby query words into high quality data terms. Guoetal. [2007] present efficient algorithm to conduct topology search over biological databases. Shao et al. [2009b] present an effective workflow search engine, WISE, to find informa- tive and concise search results,defined as the minimal views of the most specific workflow hierarchies containing keywords for a keyword query. 133 Bibliography Sanjay Agrawal, Surajit Chaudhuri, and Gautam Das. DBXplorer: A system for keyword-based search over relational databases. In Proc. 18th Int. Conf. on Data Engineering, pages 5–16, 2002. DOI: 10.1109/ICDE.2002.994693 2.1, 2.3 Shurug Al-Khalifa, Cong Yu, and H. V. Jagadish. Querying structured text in an xml database. In Proc. 2003 ACM SIGMOD Int. Conf. On Management of Data, pages 4–15, 2003. DOI: 10.1145/872757.872761 4.5 Sihem Amer-Yahia, Pat Case, Thomas Rölleke, Jayavel Shanmugasundaram, and Gerhard Weikum. Report on the db/ir panel at sigmod 2005. SIGMOD Record, 34(4):71–74, 2005. DOI: 10.1145/1107499.1107514 (document) Sihem Amer-Yahia and Jayavel Shanmugasundaram. Xml full-text search: Challenges and oppor- tunities. In Proc. 31st Int. Conf. on Very Large Data Bases, page 1368, 2005. (document) Andrey Balmin, Vagelis Hristidis, Nick Koudas, Yannis Papakonstantinou, Divesh Srivastava, and Tianqiu Wang. A system for keyword proximity search on xml databases. In Proc. 29th Int. Conf. on Ve ry Large Data Bases, pages 1069–1072, 2003. 4.5 Andrey Balmin, Vagelis Hristidis, and Yannis Papakonstantinou. ObjectRank: Authority-based keyword search in databases. In Proc. 30th Int. Conf. on Very Large Data Bases, pages 564–575, 2004. 5.3.1 Zhifeng Bao, Tok Wang Ling, Bo Chen, and Jiaheng Lu. Effective xml keyword search with relevance oriented ranking. In Proc. 25th Int. Conf. on Data Engineering, pages 517–528, 2009. DOI: 10.1109/ICDE.2009.16 4.5 Gaurav Bhalotia,Arvind Hulgeri,Charuta Nakhe,Soumen Chakrabarti,and S.Sudarshan. Keyword searching and browsing in databases using BANKS. In Proc. 18th Int. Conf. on Data Engineering, pages 431–440, 2002. DOI: 10.1109/ICDE.2002.994756 3.1, 3.1, 3.3.1, 3.3.1 Sergey Brin and Lawrence Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks, 30(1-7):107–117, 1998. DOI: 10.1016/S0169-7552(98)00110-X 3.1, 4.4.1 Kaushik Chakrabarti, Venkatesh Ganti, Jiawei Han, and Dong Xin. Ranking objects based on relationships. In Proc. 2006 ACM SIGMOD Int. Conf. On Management of Data, pages 371–382, 2006. DOI: 10.1145/1142473.1142516 5.3.1 . Keyword searching and browsing in databases using BANKS. In Proc. 18th Int. Conf. on Data Engineering, pages 431–440, 2002. DOI: 10.1109/ICDE.2002.994756 3.1, 3.1, 3.3.1, 3.3.1 Sergey Brin and. effective workflow search engine, WISE, to find informa- tive and concise search results,defined as the minimal views of the most specific workflow hierarchies containing keywords for a keyword query. 133 Bibliography Sanjay. FOR KEYWORD SEARCH ON DATABASES cardinality constraints. Degree constraints define the attributes and relations in D  . They include (1)the maximum numberof attributesin D  ,and(2) theminimum

Định dạng
Số trang	5
Dung lượng	107,88 KB