(Luận văn thạc sĩ) an improvement solution for multiple attribute information searching based on structured p2p networks luận văn ths công nghệ thông tin

о ѵ ч ^ !N ỏ 广 An Improvement Solution for M ultiple A ttribute Information Searching Based On Structured P2P Networks N g u y e n T h a n h D a t Faculty of Information Technology Hanoi University of Engineering and Technology Vietnam National University, Hanoi Supervised by Doctor Nguyen Hoai Sổn A thesis s u b m itte d in fu lfillm e n t o f the re q u ire m e n ts fo r th e degree o f M a ste r o f In fo rm a tio n Technology Decem ber, 2009 Đ A I H Ọ C QUO C G IA HA NỘI TRUNG TẨM THÔNG TIN THU VIỀN A - L 1.3 P2P network m o d e ls 1.3.1 P2P N e tw o rk 1.3.2 P2P Network Models 10 1.3.2.1 Unstructured P2P N e tw o rk 10 Hybrid P2P N etw ork Structured P2 P N etw ork 12 14 1.4 DHT-based P ro to c o l 15 1.3.2.2 1.3.2.3 1.4.1 Distributed Haah Table - DHT 15 1.4.2 CHORD Protocol 16 1.4.2.1 Topology 17 1.4.2.2 Lookup and I n s e r t 1.4.2.3 Join and Leave 20 1.4.2.4 Stabilization and F a ilu re 1.5 Summary , 18 21 22 R e la te d W o rk s 2Л 24 INS/Twine ：Information distribution based on Attribute-Value trees 24 2.1.1 S o lu tio n 24 2.1.2 System architecture 26 2.1.3 System architecture 26 2.1.4 S u m m a ry 27 iii — 2.2 _ TABLE OF CONTENTS CDS: Irifonnation D istribution Based O il Load Balancing M a trix 2.2.1 28 S o lu tio n 28 2.2.2 System architccturc 28 2.2.2.1 Registering a content n a m e 29 2.2.3 System architecture 29 2.2.4 2.2.3.1 Query re solutio n 30 2.2.3.2 S u m m a ry 30 Load Balancing M a trix (LB M ) 2.2.4.1 2.2.5 2.3 The structure of L B M .31 System architecture 31 2.2.5 2.2.6 30 LB M management mechanism 32 S u m m a ry 33 Data In d e x in g 2.3.1 33 S o lu tio n 33 2.3/2 Insert a f i l o 36 2.3.3 Lookups 37 2.3.4 S u m m a ry 38 ậ.4 SMAV: Searching - M ultiple-attribu te V a lu e 38 2.4.1 S o lu tio n 38 2.4.2 Distribution of information c o n te n t 40 2.4.3 Information content name query .42 2.4.4 S u m m a ry 42 A n Im p ro v e m e n t S o lu tio n fo r M u ltip le - a ttr ib u te In fo r m a tio n Search in g on S tru c tu re d P P N e tw o rk 3.1 44 I d e a 45 3.2 Three Levels MappingModel 3.2.1 Overview 46 3.2.2 Thrce-Levclb Sub-key Mapping S to rin g 48 3.2.3 Distribution of information c o n te n t 50 3/2.4 Inform ation q u e ry 54 3.2.5 S u m m a r y 57 3.3 The Dynamic ThresholdV alue s 57 3.3.1 A formula of threshold v a lu e s 57 3.3.2 Adjusted Distribution A lg o rith m 59 3.3.3 Updating Threshold Value Periodically 61 46 TABLE OF CON TENTS 3.1 _ _ v 3.3.4 Adjusted Lookup A lg o rith m 62 Summary 64 S im u la tio n s and E v a lu a tio n s 65 Q ualitative 65 4.2 Simulation D e s c rip tio n 67 4.3 Evaluation Based On S im u la tio n s 68 E valuations 4.3.1 Load balancing 68 4.3.2 Distribution of informationc o n te n t 71 4.3.3 Routing Performance Conclusions and Future W ork 74 77 List of Figures 1.1 Client/Server network m o d e l 1.2 Peer-To-Peer network model with peers 10 1.3 Locating resources in a Gnutella-likc P 2P e n v iro n m e n t 11 1.4 An example of H yb rid P2P M o d e l 13 L5 15 1.6 Distribution data progress based on D H T Chord's key space with 23 points 17 1.7 Chord's com ponents 17 1.8 Lookup progress of Chord*s protocol with key 19 1.9 Joining phase of a node in C H O R D protocol 21 2.J Meta string and A V T rc o .25 2.2 Architecture of IN S /T w in S y s te m 26 2.3 S p litting a rcbourcc description into s t r a n d s 27 2.4 The architecture of CDS system 29 2.5 An example of distribution of AVs in nodes 30 2.6 The structure of Load Balancing Matrix for { 이， V i } 31 2.7 An example of described d a t a 34 2.8 Sample File Queries .34 2.9 Mappings between q u e rie s 35 2.10 Query mapping for three descriptors 36 2.11 An example of mappings t r e e 36 2.12 An example of a path of queries 37 2.13 Key - sub-key m a p p in g s 40 3.1 The number of hop levels of a content name in pure S M A V 3.2 Mappings are created fr 이ฑ key k ị 49 3.3 The generation of distributed keys from a content n a m e 53 3.4 Block diagram of query progress of improved S M A V 55 49 LIS T OF FIG U R E S v ii 3.5 3.6 An example of query progress with common k e y s 56 Combining of common keys and a uncommon k e y 61 4.1 The distribu tion of ЛѴ pairs in content n a m e s 68 4.2 Number of inform ation contents stored in each of 5000 n o d e s 69 4.3 The number o f queries is processeci by each of 5000 nodes 70 4.4 4.5 Load balancing among nodes .70 Mappings stored in every node 72 4.Г) Mappings is created by CNs in a DSMAV b S M A V .72 4.7 4.8 4.9 The number of keys stored in every n o d e 73 Level-к Sub-keys are created by three solutions 73 Logicai hop count required for each query 74 4.10 The maximum number of hop level of three s o lu tio n s 75 4.11 The number of successful queries 75 List of Tables 1Л Comparison: Client/Server vs P P 1.2 Definition of variables for node ท using m -bit identifiers 18 2.1 Mapping tabic between distributed key and content n a m e s 41 2.2 Mapping table between distributed key and s u b -k e y s 41 2.3 Mapping table between distributed key and uncommon k e y s 42 3.1 Mapping Table between distributed key and content n a m e s 51 3.2 Mapping table between level-2 sub-keys and distributed k e y s 53 3.3 Mapping Table between keys and co n te n ts 60 viii Abstract Conventional information searching engines such as Google, Yahoo, and Wikipedia support only Keyword-based searching on websites They cannot search information in various kinds of resources such as personal devices like Laptop, PDA} Cell Phone or sharing files in P2P Network Besides, DHT-based P2P networks such as Chord, CAN, Pastry can achieve cxact (!ucry (i.e query of an exact key) with characteristic of scalability, efficicncy and fault-tolerate However, in the Cítóe of complex queries such as range query or multiple-attribute query, pure DHT is not efficient since lots of query messages must be sent In this thesis, we focus our intentions on m ultiple-attribute query on DHTbatícd P2P network The big problem here is the unbalance among nodes due to the appearance of common attribute/value pairs (AV pairs) in content names The main idea of our method is to lim it number of content items, which assigned to an ID by creating sub-IDs from multiple AV pairs if those AV pairs appear in lots of content names, to threshold value of each node To reduce query cost, our system also keeps the mapping between an ID and its sub IDs if existed in the node responsible for the ID Moreover, we store only mappings, which are created in distribution progress, to nodes Our method can achieve both efficiency and a good degree of load balancing even when the distribution of AV pairs is skewed Our simulation result shows the efficiency of our solution in respects of lookup time and the degree of load balancing Chapter Introduction 1.1 Overview and Motivation With the unprecedented growth of information technology, today wecan see that information is appearing in everywhere Information might be found in various kinds of resource« «uch as personal dcvices like Laptops, PDAs, Cell phones , websites in the Internet, sharing files in P2P network 1., From the explosion of information, there are more and more information search ing demands in somewhere Every day we need lots of information to communicate and work efficiently and easily For instance, we search for weather forecast information before a trip or a picnic We also search for information of the latast news of the day，refercncc« of a product to buy, information of market priccs, etc In lots of cases, if we seize desired information quickly and exactly, we might have more suc cessful opportunities in communication and work Therefore，information searching is a necessary demand in nowadays information age The emergence of new applica tions and services will require an efficiency information searching system which can realize complex query on contcnt names in a sealable manner (พ Adjie-Winoto & Liliey 1999；Carzaniga Sz Wolf, 2001; Foster & Tuecke，2002) There are many large systems to allow searching information such as conventional search engines: Google, Yahoo Amazon, eBay, Wikipedia Google engine allows users to search information based on keywords on Internet This engine can link to billions of websites to search information Information of each website is described by keywords and then they are processed and stored in servers of Google Conventional search systems often use Client/Server model where servers pro 1.1 Overview and M otivation vide searching services to clients However, Client/Server model have some disad vantages Firstly, it has limitation in scalability Servers are made with high cost because it need a very big capacity of processing and storing Secondly, each server may be a single point of failure When server goes down，operations will be ceascd Moreover, as the big number of simultaneous client requests to a given server increaiies the server can become overloaded When a big amount of clients join to the network, traffic congestion on the network has also been an issue Rcccntly, the appcarancc of Pccr-to-Pccr (P2P) network model has attracted the interest of lots of people, P2P with their decentralized control, self-organization and adaptation have emerged as a significant social and technical phenomenon over the last year Unlike Client/Server model, P2P networks aim to aggregate largo num bers of computers that join and leave the network frequently In pure P2P systems, individual computers communicate dircctly with each other and bharc information and resources without using dedicated servers For example, they provide infrastruc ture for communities that share CPU cycles (e.g., SETI@Home, Entropia) and/or storage spacc (e.g., Napster (Idit Kcidar, 2006; Napster, 1999) FrccNet, Gnutella (Gnutella, 1999)) or that support collaborative environments (Groove) In P2P networks, all clients provide resources, including bandwidth, storage spac(ỵ, and computing power If there are more and more many nodes to join to the svstem, the total capacity of the systcn would be more and more increase This is not true of Client/Server network model with a fixed set of servers, in which adding more clients could mean slower data transfer for all users The distributed nature of P2P networks also increases robustness in case of failures by replicating data over multiple peers, and by enabling peers to find the data without relying on a centralized index server In the latter ease, there i« 110 single point of failure ill the system Information searching on P2P network is attended in recent years Advantages of P2P network model allows us to construct information searching systems with capabilities of scalability and fault-toleratc Bccausc of the whole of data of system are distributed to all nodes; each node is responsible for a portion of data and to take part, in search progrevss The Gnutella network (Gnutella, 1999) supports to share and search file« It searches data by flooding messages to the whole network Nevertheless? Gnutella network requires high overhead; the search may be failed because a query may be not routed to the node is responsible for desired information Hence, it leads to search information inefficiently eDonkey (Weikum, 2002) network C h a p te r S im u la tio n s a n d E v a lu a tio n s Secondly, lookup progress depends on the total of mapping of the system If the number of mappings is big, routing progress is more and more exact W ith pure SMAV, the total of mapping of the system is N ++ ร 工 2- N a ( N l - к ) - z C i N k However, this number of mapping contains redundant mappings，which is created hv mapping of distributed key and uncommon key Our solution limits the redundant mapping by removing mappings between distributed key and uncommon key In our system the t ot al of mapping therefore is ЕІГ4 C ị ì + N - N l + E 工 (С ^ - N k ) Where, yV - N x is the number of mapping between distributed key and content name in level ^ ^ 2( ^ NÏ 一 ^ k) iS the number of mapping between distributed key and с이ũciit name in level к (k - N 1) ^ the number of mapping between distributed key and level-к sub-key (Ẳ: > 1) Efficaciousness of lookup algorithm it not based on mappings but it also depends 011 commonnoss of AV pairs If a query name contains a minimum of one AV pairs, query result only based on one query message, which is sent to the node responsible for the AV pair directly Otherwise, lookup progress depends on more sub-keys Be sides the number of hop, which takes part in lookup progress, shows efficaciousness of lookup algorithm based on lookup time In our solution, the maximum number of hop level, which shows the number of node that takes part in routing progress, is While a node is responsible for routing to level-2 sub-key and a node take responsibility for routing to lcvcl-k sub-keys (Ả: > 2) Another parameter, hop count is the total of nodes, which take part in lookup and routing progress It depends on lookup progress of Chord protocol Thirdly, storage load of every node is evaluated based on information content that have distributed to nodes of the network Threshold value of every node, which is defined by (* p.58) and updated periodically (Chapter 3.3.3), allows keeping the content names of every node so that it is not over its load capacity dynamically, it hence keeps load balancing among nodes However, here arc only qualitative evaluations based on our theory In next sec tion we would describe and implement a simulation program based on conventional solution, pure SM AV solution and our solution The result, which is achieved from simulation program, then would be used as quantitative evaluations 4.2 S im u la tio n D e s c rip tio n 4.2 67 Simulation Description To show the cffcctivcncss of our system, we built a simulation program in c # which simulates three solutions of information content distribution and searching: A conventional solution Because of conventional solutions such as INS/Twinc, CDS Data Indexing meet a problem of arising of load unbalancing among nodes b(、 caiise common keys, and this is also a problem of DHT-based protocols, we imple ment Chord protocol a conventional solution to comparo to below two solutions Distribution of information contents based on hashing value of each AV pair in the content name Л content name is hashed to a set of distributed keys A distributed key a n d th e c o n te n t n a m e a re se n t t o th e n o d e re s ỊX )n s ib le fo r a d is t r ib u t e d key T h e query oil multiple-AV pairs is carried out by choosing randomly one AV pair in a query name to crcatc a query key and send a query message to the node responsible for a query key Distribution and query of information contents baæ on pure SMAV algorithm Distribution and query of information content base on our algorithm，which is called DSMAV The resource names are generated based on the Zipf distribution，which reflects the popularity of an AV pair based on a parameter called a rank The probability that an ЛѴ pair appears in a resource name is in proportion to Here, r is the rank of the AV pair and a is a constant Our simulation program assigns attributes and values to a content name ran domly with the average number of AV pairs in a content name is 10 It is a reason able number to specify information content in practice The simulation program is implemented with the following parameters: • The number of simulation nodes: 5,000 • The number of the content names: 20.000 • The number of queries：5.000 • The total number of AV pairs in all content names: 150.073 • The number of different AV pairs that appear in content names: 69.507 Moreover, we define other parameters as follows: Load capacity of each node is chosen by a value randomly The number of AV pair of a content name is calculated 68 C h a p te r S im u la tio n s a n d E v a lu a tio n s based on Zipf distribution In our simulations, if a node is overloaded, new data will be rejected The next section shows charts of our simulation results To see clearly compared results, wc zoom in some charts For example in Figure 4.2 we simulate the dis tribution of AV pairs in content names with 5000 nodes However, the number of nodes is drawn in this Figure be 140 instead of 5000 nodes 4.3 Evaluation Based On Simulations With simulations, our solution is evaluated based three factors ： load balancing, distribut ion of information content and lookup of information W ith each factor, we compare between three solutions: conventional, pure SMAV and DSMAV solutions In SMAV simulation, we have chosen 10 as the value Nmax The following are our results and evaluations 4-3.1 Load balancing Load balanced of the system is evaluated based on the number of information con tents is stored ill each node, the number of queries are processed by each node S H U m J H J y ỵllo o Ịnlol JO Р ' ІІ УЛ ЮЛ1 ІІ A V pair o f ID Figure 4.1: The distribution of AV pairs in content names Figure 4.1 shows the distribution of AV pairs on the set of content names used in the simulation The term ’■AV pair of ID ，，in the Figure means the order number of an AV pair Because AV pairs are generated based on Zipf law, there are some 4.3 E v a lu a tio n B a sed O n S im u la tio n s 69 ЛѴ pairs occurring with high probability Especially, with 150.073 AV pairs，there is an AV that appears in 6.4% of total content name« It means that if we use roimmtiomil distribution method, the node that responsible for the key assigned to the AV pair will have to store 6.4% of total number of content names (Figure 4.2) With pure SMAV system, the maximum number of content names stored in a node is Q.6% of the total number of content names However, in our DSMAV system the maximum number of content names stored in a node is not over 0.5% of the total number of content names (Figure 4.2) As the result, we consider that our DSMAV system can achicvc a good degree of load balancing on the distribution of informal ion contents ll rJ ccoІ С1 І2 o a g p c i> O K > < j 50 100 150 200 Order o f Node Figure 4.2: Number of information contents stored in each of 5000 nodes 111 Figure 4.3, we can see that in the case of using DSMAV distribution method, the maximum number of query lookups that each node is responsible for is 36,294 of total number of queries W ith pure SMAV distribution method, the maximum number of query lookups of a node is 9.357 While in the catJG of using conventional distribution method, the maximum number of query lookups that each node is re sponsible for is 1,775,135 of total number of queries It means that the query load of the DSMAV solution is distributed more equally than the query load of conven tional solution However, in our solution a few of nodes, which take responsibility for level-2 sul>keys have to process queries more others Since, query load of these nodes would bigger than others that would be reduced the query load This is the 70 C h a p te r S im u la tio n s a n d E v a lu a tio n s cost \vc have to p a y t o re d u c e lo o k u p tim e o f th e s y s te m 100.000 90.000 80.000 70.000 60.000 50.000 40.000 30.000 20.000 10.000 •« e » о 10 20 30 40 50 Order o f Node Figure 4.3：The number of queries is processed by each of 5000 nodes Figuro 4.4 shows load balancing of the system The number of overloaded nodes is 26 nodes; it makes up 0.5% of the total of nodes in conventional solution With pure SMAV solution, the number of overloaded nodes is 7, making up 0.1% In our solution, the number of overloaded nodes only is 1, 0.02% It means that, DSMAV solution keeps a better degree of load balancing than conventional and SMAV solutions ІбрпсІЛило J O o ^ w u lr o jr v l Order o f Node Figure 4.4：Load balancing among nodes Ill general load balancing among nodes depends on information distribution and 4-3 E v a lu a tio n B a sed O n S im u la tio n s 71 lookup algorithms Here the load« consist of storage and query loads The storage load depends on distribution algorithm, and the quer>? load depends on lookup algorithm In the next section, we evaluate these algorithms 4-3.2 D is trib u tio n of inform ation content We believe that a distribution method would be effective if the mappings and keys arc distributed to every node equally Since, the number of Mappings, which are crcalod by the whole content names of the system, would be distributed to nodes equally It hence helps every node to keep the balancing of mappings that cause the problems of query load Effective distribution algorithm allows limiting the number of mappings and keys that are created by a content name In section we hence evaluate distribution algorithm ba^cd on distribution of mappings and keys among nodes Figure 4.5 presents a distribution of Mappings among nodes We see that, in the case of conventional solution, the maximum number of Mappings, which cach node takes responsibility for, is 4,474 Mappings It occupies 6.4% of all keys of the system It is easy to understand that the number of Mappings corresponds with the number of times of a key W ith pure SMAV and DSMAV solutions, the maximum numbers of Mappings, which each node is responsible for, are 1,257 and 1,248 Mappings, to occupy 1.8% of the whole of the system It means that the number of Mappings of DSMAV is distributed more efficiently than the number of Mappings of conventional solution Moreover, our solution is more significant than pure SMAV solution in practicc because of threshold values, which are determined automat ically based on keys and load capacity of a node, allows limiting the number of Mappings that is created from common keys Besides、our solution also limits the number of Mappings that is crcatcd by a contenl name (Figure 4.6) In DSMAV solution, the percentage of content name, which created the maximum of 10 Mappings，occupies 90% of all contcnt names While is the case of pure SMAV solution, the percentage of content name, which created the maximum of 10 Mappings, is 81% of the whole content names In the case of DSMAV solution, the number of contcnt names, which crcatcd more 30 Mappings, is only 1.8% of all content names W ith pure SMAV solution, this percentage is 4.6% To so, our solution significantly limited the maximum number of Mappings, which is created common keys based on dynamic threshold values However, our solution 72 C h a p te r S im u la tio n s a n d E v a lu a tio n s な 굳 ỵ 4900 4920 4940 4960 4980 5000 Order o f Node Figure 4.โ): Mappings stored in every node crcatcd more keys than conventional solution 00% -Per of CNs ř 00% ?J zù я 60% I 40% г 10 20 30 40 Number o f Mappings 50 20 % 20 40 60 80 100 Number o f Mappings Figure 4.6：Mappings iíá created by CNs in a DSMAV b SMAV \Ve can see that in Figure 4.7 three solutions distributed the whole keys to nodes with similar rates The maximum number of keys, which a node is responsible for, in conventional solution is 119 keys Because of two solutions SMAV and DSMAV create new keys from common keys; their number of keys is more than conventional solution In pure SMAV solution, the maximum number of key« of a node is 231 Our solution has better result than pure SMAV solution, the maximum number of keys, which a node is responsible for, is 219 keys The average number of keys* which a node is responsible for, of conventional, pure SMAV and DSMAV solution are 14， 4.3 Evaluation Based On Simulations •- — '— '• ■- - - .— 1.111 ■ - - 73 — 23 and 19 The totals o f keys of three solutions arc 69508, 115709 and 93055 keys It means that in our solution the number of new keys, which are created bv content name and common keys, is smaller than pure SMAV solution, it is reduced very much based 011 dynamic threshold values O ur solution also reduces the number of level-к sub-key (k > 1) significantly (Figure 4.8) The whole of key of each level in DSMAV solution is smaller than pure SMAV solution I t means that, the content names ill DSMAV solution are distributed to nodes more efficiently Order o f node Figure 4.7：The number of keys stored in every node Figure 4.8: Levcl-k Sub-keys are createci by three solutions By distributing Mappings and keys to nodes equally, our distribution algorithm is performed more efficiently and significantly than conventional and pure SMAV solution Relying on th at, query progress is run more exactly and completely than 74 C h a p te r S im u la tio n s a n d E v a lu a tio n s two solutions In next section, we would evaluate routing performance of queMv progress г с ш іа іи 4.3.3 Routing Performance Tlue result, of the simulation about logical hop count required for each query is shown in Figure 4:9 In the case of the DSMAV distribution method, the average hop count reqiui:i4ỵd for a query is 9.8 hops while in the case of the conventional distribution mctthiod, the average hop count required for a query is 9.3 (Figure 4.9) It is due to in the DSMAV system, before sending a query message; a query nodle must decide which key is used as the destination to send a query message Furthermore, a query message may be forwarded for several times However, the avraigc number of hop count increases only 0.5 hops comparing with the case of (Ч И П volitional distribution method Moreover, in the case of pure SMAV solution, the avcnaigc hop count required for a query is 12.5 As a result, our solution improved CỊUCírพุ progress by reducing the average number of hop count of each query Relying oil tlhat, it would reduce lookup time of the system It means that our DSMAV systtenn can achieve a reasonable routing performance 70 60 -DSMAV r * * p 40 I 30 น 20 10 1000 2000 3000 4000 5000 Order o f query name Figure 4.9: Logical hop count required for each query HIop level is defined as the number of node that forwards query messages to othoeir nodes in routing progress Figure 4.10 shows that hop level of three solutions Beccaiuse of in conventional solution a query node would send query message to the 4.3 E v a lu a tio n B ased O n S im u la tio n s 75 node is responsible for a queried key dircctly, it is responsible for sending query messages to other nodes It means that the number of hop level is While in the case of pure SMAV solution, the number of nodes, which are responsible for sending query messages to other nodes, depends on the maximum number of level of a sub" key The maximum number of hop level hence is W ith DSMAV solution, because of we limited the number of nodes, which are responsible for sending query messages to other nodes, it only is (Figure 4.10) It means that our DSMAV system can achieve an efficient query progress ― ■ C o n v e i i u o f i a l S M A V O S M A V ] ! 1 J 1 น ЛО|ІІО|| : 厂 1 - I - _ 」 о 440D 4600 48 0 5000 Order of query names Figure 4.10: The maximum number of hop level of three solutions Figure 4.11: The number of successful queries 76 C h a p te r S im u la tio n s a n d E v a lu a tio n s Efficiency o f lookup algorithm ib also shown ill Figure 4.11 It shows the percent age' of successful queries in three solutions W ith unequal distribu tion of content nannes in conventional solution, the percentage of successful queries is low It only occ:upicb 57.1% W hile in the ease of SMAV and DSMAV solutions, the percentages of Süuccessful queries are 99,9% and 1009c The lookup algorithm of DSMAV solution is confident algorithm I ll summary, lookup algorithm of DSMAV solution performed reasonably, effidemt.ly and confidently The average number of hop count is only more than conventiom al solution 0.5 hops The maximum number of hop level is The pcrccntage of ^successful queries is 100% Relying on that, our DSM AV solution would be perforrmed reasonably, efficiently and confidently Chapter Conclusions and Future Work DHT-based P2P networks such as Chord, CAN, Pastry, etc can achieve exact query with characteristic of scalability, efficiency and fault-tolcratc However, in the case of complex queries such as range query or m ultiple-attribu te query, pure D H T is not efficient since a lot of query messages must be sent Some proposed solutions for M u ltip le -a ttrib u tc inform ation searching such as IN S /T w in e r CDS or Data indexing support, range query or Multiple-attribute queries However, conventional researches m ee t a p ro b le m a ris in g fro m th e lo a d u n b a la n c e a m o n g n od es because th e a p p e a r ance of common AV pairs in content names (i.e AV pairs those appear in a lot of content names) Moreover, the cost and time of lookup in some conventional solu tions such INS/Twine and Data indexing are problems that the Multiplc-attributc information searching systems have to meet In this thesis，We have proposed an improvement solution for m ultiple-attribute inform ation searching based on structured P2P networks O ur solution lim its the number of inform ation contents of a key to a threshold value of a node I f the number of inform ation content of a key is more than a threshold value of a node, it w ill be distributed to other nodes based on sub-keys, which are created by two or many AV pairs We also create and store mappings from inform ation content Then, we uso mappings to route in query progress Mappings are stored in the nodes that are responsible for lcvel-2 sub-keys O u r solution can achieve a good degree of load balancing even when the d istri bution of AV pairs is skewed The average number o f stored inform ation contents and processed queries of a node are smaller than conventional solutions O ur solution also can achieve a reasonable routing performance by maintaining 77 78 C h a p te r C o n c lu s io n s a n d F u tu re W o rk the miappingb between level-2 sub-keys and distributed keys In a query progress, the* miaximum number of hops is It is smaller than SMAV solution : Siimnlation results show the efficiency of the solution in respects of querying pciiforrimncc The pcrccntagc of succcssful queries is 100% This result is better tham conventional solutions ill our simulation ( 0)UJ solution is suitable for systems which the set of attribute is enough big such 3J6 ILocation-baeed inform ation searching system, Inform ation system of libraries Beeご muse of if the number o f a ttrib u te is small, the probability of keys, which are cToaat'od by common keys, bccomcs common keys be very high I t hcncc is not enough keyss to croate new distributed keys from the last common key of a content name As Í a result the node, which is responsible for the last of common key, have to store whcolœ content names that correspond w ith the last common key The node since nm>y be overloaded and to cause the loss of information ] Bicsidcs load balancing of our solution is evaluated to ЬайС oil content storage aiidl (ชุน！ ery load Ill the теоі network, the query load also depends on the capacity of qquieiry of each node We can add more threshold values to lim it the number of querrjy »of each node Otherwise, some load balancing solutions, which allow moving posiút:io»n (ID ) of a node, call also be applied to improve content storage balancing ฝ göOr： it lim ill our solution 1Fi'or above reason, in future we will focus our attention on improving our solution to aacïhitovx' lo a d b a la n c e d o f q u e ry W e w ill a lso im p le m e n t a n d e v a lu a te o u r s o lu tio n in nreial network systems Bibliography A, Rowstron, P D (2001) Pastry: Sealable, distributed object location and rout ing for large-scale peer-to-peer systems IF IP /A C M International Conference on Distributed Systems Platforms Carzaniga, D R fc Wolf, A (2001)» Design and evaluation of a wide-area event notification service ACM Transactions on Computer Systems Foster C Kcssclman, J N., к Tuccke, ร (2002) Grid services for distributed system integration IEEE Computer, 36 Gao P ร (2004) Design and evaluation of a distributed scalable content discovery system IEEE Journal on Selected Areas in Communications Garccs-Erice, Р.Л Felber, E Вм к Ross, G I 니 K K (2004) Data indexing ill peerto-peor dht networks 24rd International Conference on Distributed Computing Systems Gnutella (1999) Gnutella website: http://gnuteUa.wego.com http://cn.wikipcdia.org/wiki/gnutclla Gnutella, Idit Keidar R M (2006) Evaluating unstructured peertopeer lookup overlays SA e ro A p n l 2ร า K Gummadi, R Gummadiy, ร G., Ratnasamyx ร Shenker, ร 1& Stoica, I (2003) The impact of dht routing geometry on resilience and proximity SIGCOMM03 M Balazinska, H Balakrishnan, D K (2002) Ins/twine: A scalable peer-to-peer architecture for intentional resource discovery International Conference on Per- vasive Computing 79 80 ) Napapastcr (1999) Bibliography Napster website: http: / / www.napstcrxom hthtttpiz/on.wikipedia.org/wiki/napster Napster Ngiguzyen Hoai Son, II ร D (2008) A solution for multiple attribute information isesesarching on structured p2p network Coltech of Technology, Vietnam National บ hWniversity1 Hanoi ร RR국atnasamy、p Francis, M H., &: Karp, R (2001) A scalable content-addressable nmeetwork ACM SIGCOMMOl Stoioiúca, R Morris, D K., Kaashock, м., & Balakrisnan H (2001) Chord: A sealable p(peeer-t:o-peer lookup service for internet applications ACM SIGCOMMOl พ Adjic-Winoto E Schwartz, и B Ẵ: Lilley, J (1999) The design and implementataation of an intentional naming systems ACM Symposium on Operating Systems PiPfrinciples Weieilikum G (2002) Peer-to-peer information systems A Y / Mrakawa, H Minami, M M MYamaguchi, м., & Saito, H (2005) Dht based peer- totoo-pccT scarch system using pscudocandidatc key indexing IEỈCE Transactions orom Communications, J88-B ... need lots of information to communicate and work efficiently and easily For instance, we search for weather forecast information before a trip or a picnic We also search for information of the latast... as a distribution inform ation discovery solution I t also based on Structured P2P architecture, w ith protocols such Chord, CAN I t means that information contents are transformed to a set... a content name forms an attributed/value tree and each strand of an AV tree is matched to a key Information content is then distributed to cach node rcbponsiblc for the key based on DH 丁 routing

Định dạng
Số trang	87
Dung lượng	36,02 MB