the internet encyclopedia volume phần 9 potx

P1: 57 Yu WL040/Bidgolio-Vol I WL040-Sample.cls June 20, 2003 17:52 Char Count= 0 WEB SEARCH TECHNOLOGY750 document selector is to utilize the fact that most search engines return retrieved results in groups. Usually, only the top 10 to 20 results are returned in the first result page but the user can make additional requests for more result pages and more results. Hence, a document selector may ask each search engine to return the first few result pages. This method tends to return the same number of pages from each selected search engine. Since different search engines may contain different numbers of useful pages for a given query, retrieving the same number of pages from each search engine is likely to cause over-retrieval from less useful databases and under-retrieval from highly useful databases. More elaborate document selection methods try to tie the number of pages to retrieve from a search engine to the ranking score (or the rank) of the search engine relative to the ranking scores (or ranks) of other search engines. This can lead to proportionally more pages to be retrieved from search engines that are ranked higher or have higher ranking scores. This type of approach is referred to as a weighted allocation approach in (Meng et al., 2002). For each user query, the database selector of the metasearch engine computes a rank (i.e., 1st, 2nd, ) and a ranking score for each local search engine. Both the rank information and the ranking score information can be used to determine the number of pages to retrieve from different local search engines. For example, in the D-WISE system (Yuwono & Lee, 1997), the ranking score information is used. Suppose for a given query q, r i denotes the ranking score of the local database D i , i = 1, , k, where k is the number of selected local databases for the query, and α =  k j=1 r j denotes the total ranking score for all selected local databases. D-WISE uses the ratio r i /α to determine how many pages should be retrieved from D i . More precisely, if m pages across these k databases are to be retrieved, then D-WISE retrieves m ∗ r i /α pages from database D i . An example system that uses the rank information to select documents is CORI Net (Callan et al., 1995). Specifically, if mis the total number of pages to be retrieved from k selected local search engines, then m∗ 2(1 + k − i) k(k + 1) pages are retrieved from the ith ranked local database, i = 1, , k. Since 2(1 + k − u) k(k + 1) > 2(1 + k − v) k(k + 1) for u < v, more pages will be retrieved from the uth ranked database than from the vth ranked database. Because k  i=1 2(1 + k − i) k(k + 1) = 1, exactly m pages will be retrieved from the k top-ranked databases. In practice, it may be wise to retrieve slightly more than mpages from local databases in order to reduce the likelihood of missing useful pages. It is possible to combine document selection and database selection into a single integrated process. In Database Selection, we described a method for ranking databases in descending order of the estimated similarity of the most similar document in each database for a given query. A combined database selection and document selection method for finding the m most similar pages based on these ranked databases was proposed in Yu et al. (1999). This method is sketched below. First, for some small positive integer s (e.g., s can be 2), each of the stop-ranked databases are searched to obtain the actual global similarity of its most similar page. This may re- quire some locally top-ranked pages to be retrieved from each of these databases. Let min sim be the minimum of these s similarities. Next, from these s databases, retrieve all pages whose actual global similarities are greater than or equal to min sim.Ifm or more pages have been retrieved, then sort them in descending order of similarities, return the top mpages to the user, and terminate this process. Otherwise, the next top ranked database (i.e., the (s + 1)th ranked database) is considered and its most similar page is retrieved. The actual global similarity of this page is then compared with the current min sim and the minimum of these two similarities will be used as the new min sim. Then retrieve from these s + 1 databases all pages whose actual global similarities are greater than or equal to the new min sim. This process is repeated until m or more pages are retrieved and the m pages with the largest similarities are returned to the user. A seem- ing problem with this combined method is that the same database may be searched multiple times. In practice, this problem can be avoided by retrieving and caching an ap- propriate number of pages when a database is searched for the first time. In this way, all subsequent “interactions” with the database would be carried out using the cached results. This method has the following property (Yu et al., 1999). If the databases containing the m desired pages are ranked higher than other databases and the similarity (or desirability) of the mth most similar (desirable) page is distinct, then all of the m desired pages will be retrieved while searching at most one database that does not contain any of the m desired pages. Result Merging Ideally, a metasearch engine should provide local system transparency to its users. From a user’s point of view, such a transparency means that a metasearch search should behave like a regular search engine. That is, when a user submits a query, the user does not need to be aware that multiple search engines may be used to process this query, and when the user receives the search result from the metasearch engine, he/she should be hidden from the fact that the results are retrieved from multiple search engines. Result merging is a nec- essary task in providing the above transparency. When merging the results returned from multiple search engines into a single result, pages in the merged result should be ranked in descending order of global similarities (or global desirabilities). However, the heterogeneities that exist among local search engines and between the metasearch engine and local search engine make result merging a challenging problem. Usually, pages returned from a local search engine are ranked based on these pages’ local similarities. Some local search engines make the local similarities of returned pages available to the P1: 57 Yu WL040/Bidgolio-Vol I WL040-Sample.cls June 20, 2003 17:52 Char Count= 0 METASEARCH ENGINE TECHNOLOGY 751 user (as a result, the metasearch engine can also obtain the local similarities) while other search engines do not make them available. For example, Google and AltaVista do not provide local similarities while Northern Light and FirstGov do. To make things worse, local similarities returned from different local search engines, even when made available, may be incomparable due to the use of different similarity functions and term-weighting schemes by different local search engines. Furthermore, the local similarities and the global similarity of the same page may be quite different still as the metasearch engine may use a similarity function different from those used in local systems. In fact, even when the same similarity function were used by all local systems and the metasearch engine, local and global similarities of the same page may still be very different. This is because some statistics used to compute term weights, for example the document fre- quency of a term, are likely to be different in different systems. The challenge here is how to merge the pages returned from multiple local search engines into a single ranked list in a reasonable manner in the absence of local similarities and/or in the presence of incomparable similarities. An additional complication is that retrieved pages may be returned by different numbers of local search engines. For example, one page could be returned by one of the selected local search engines and another may be returned by all of them. The question is whether and how this should affect the ranking of these pages. Note that when we say that a page is returned by a search engine, we really mean that the URL of the page is returned. One simple approach that can solve all of the above problems is to actually fetch/download all returned pages from their local servers and compute their global similarities in the metasearch engine. One metasearch engine that employs this approach for result merging is the Inquirus system (http://www.neci.nec.com/∼ lawrence/inquirus.html). Inquirus ranks pages returned from local search engines based on analyzing the contents of downloaded pages, and it employs a ranking formula that combines similarity and proximity matches (Lawrence & Lee Giles, 1998). In addition to being able to rank results based on desired global similarities, this approach also has some other advantages (Lawrence & Lee Giles, 1998). For example, when attempting to download pages, obsolete URLs can be discovered. This helps to remove pages with dead URLs from the final result list. In addition, downloading pages on the fly ensures that pages will be ranked based on their current contents. In contrast, similarities computed by local search engines may be based on obsolete versions of Web pages. The biggest drawback of this approach is its slow speed as fetching pages and analyzing them on the fly can be time consuming. Most result merging methods utilize the local similarities or local ranks of returned pages to perform merging. The following cases can be identified: Selected Databases for a Given Query Do Not Share Pages, and All Returned Pages Have Local Similarities Attached. In this case, each result page will be returned from just one search engine. Even though all returned pages have local similarities, these similarities may be nor- malized using different ranges by different local search engines. For example, one search engine may normalize its similarities between 0 and 1 and another between 0 and 1000. In this case, all local similarities should be renormalized based on a common range, say [0, 1], to improve the comparability of these local similarities (Dreilinger & Howe, 1997; Selberg & Etzioni, 1997). Renormalized similarities can be further adjusted based on the usefulness of different databases for the query. Recall that when database selection is performed for a given query, the usefulness of each database is estimated and is represented as a score. The database scores can be used to adjust renormalized similarities. The idea is to give preference to pages retrieved from highly ranked databases. In CORI Net (Callan et al., 1995), the adjust- ment works as follows. Let s be the ranking score of local database D and s be the average of the scores of all searched databases for a given query. Then the following weight is assigned to D : w = 1 + k *(s − s)/s, where k is the number of databases searched for the given query. It is easy to see from this formula that databases with higher scores will have higher weights. Let x be the renormalized similarity of page p retrieved from D. Then CORI Net computes the adjusted similarity of p by w * x. The result merger lists returned pages in descending order of adjusted similarities. A similar method is used in ProFusion (Gauch et al., 1996). For a given query, the adjusted similarity of a page p from a database D is the product of the renormalized similarity of p and the ranking score of D. Selected Databases for a Given Query Do Not Share Pages, but Some Returned Pages Do Not Have Local Similarities Attached. Again, each result page will be returned by one local search engine. In general, there are two types of approaches for tackling the result-merging problem in this case. The first type uses the local rank information of returned pages directly to perform the merge. Note that in this case, local similarities that may be available for some returned pages would be ignored. The second type first converts local ranks to local similarities and then applies techniques described for the first case to perform the merge. One simple way to use rank information only for result merging is as follows (Meng et al., 2002). First, arrange the searched databases in descending order of usefulness scores. Next, a round-robin method based on the database order and the local page rank order is used to produce an overall rank for all returned pages. Specifically, in the first round, the top-ranked page from each searched database is taken and these pages are ordered based on the database order such that the page order and the database order are consistent; if not enough pages have been obtained, the second round starts, which takes the second highest-ranked page from each searched database, orders these pages again based on the database order, and places them behind those pages selected earlier. This process is repeated until the desired number of pages is obtained. In the D-WISE system (Yuwono & Lee, 1997), the following method for converting ranks into similarities is employed. For a given query, let r i be the ranking score of P1: 57 Yu WL040/Bidgolio-Vol I WL040-Sample.cls June 20, 2003 17:52 Char Count= 0 WEB SEARCH TECHNOLOGY752 database D i , r min be the smallest database ranking score, r be the local rank of a page from D i , and g be the converted similarity of the page. The conversion function is g = 1 − (r − 1) * F i , where F i = r min /(m * r i ) and m is the number of documents desired across all searched databases. This conversion has the following properties. First, all locally top-ranked pages have the same converted similarity, i.e., 1. Second, F i is the difference between the converted similarities of the jth and the ( j + 1)th ranked pages from database D i , for any j = 1,2, Note that the distance is larger for databases with smaller ranking scores. Conse- quently, if the rank of a page p in a higher rank database is the same as the rank of a page p  in a lower rank database and neither p nor p  is top-ranked, then the converted similarity of p will be higher than that of p  . This property can lead to the selection of more pages from databases with higher scores into the merged result. As an example, con- sider two databases D 1 and D 2 . Suppose r 1 = 0.2, r 2 = 0.5, and m = 4. Then r min = 0.2, F 1 = 0.25, and F 2 = 0.1. Thus, the three top-ranked pages from D 1 will have converted similarities 1, 0.75, and 0.5, respectively, and the three top- ranked pages from D 2 will have converted similarities 1, 0.9, and 0.8, respectively. As a result, the merged list will contain three pages from D 2 and one page from D 1 . Selected Databases for a Given Query Share Pages. In this case, the same page may be returned by multiple local search engines. Result merging in this situation is usually carried out in two steps. In the first step, techniques dis- cussed in the first two cases can be applied to all pages, regardless of whether they are returned by one or more search engines, to compute their similarities for merging. In the second step, for each page p returned by multiple search engines, the similarities of p due to multiple search engines are combined in a certain way to generate a final similarity for p. Many combination functions have been proposed and studied (Croft, 2000), and some of these functions have been used in metasearch engines. For example, the max function is used in ProFusion (Gauch et al., 1996), and the sum function is used in MetaCrawler (Selberg & Etzioni, 1997). CONCLUSION In the past decade, we have all witnessed the explosion of the Web. Up to now, the Web has become the largest digital library used by millions of people. Search engines and metasearch engines have become indispensable tools for Web users to find desired information. While most Web users probably have used search engines and metasearch engines, few know the technologies behind these wonderful tools. This chapter has provided an overview of these technologies, from basic ideas to more advanced algorithms. As can be seen from this chapter, Web-based search technology has its roots from text retrieval techniques, but it also has many unique features. Some efforts to compare the quality of different search engines have been reported (for example, see (Hawking, Craswell, Bailey, & Griffiths, 2001)). An interesting issue is how to evaluate and compare the effectiveness of different techniques. Since most search engines employ multiple techniques, it is difficult to isolate the effect of a particular technique on effectiveness even when the effectiveness of search engines can be obtained. Web-based search is still a pretty young discipline, and it still has a lot of room to grow. The upcoming transition of the Web from mostly HTML pages to XML pages will probably have a significant impact on Web-based search technology. ACKNOWLEDGMENT This work is supported in part by NSF Grants IIS-9902872, IIS-9902792, EIA-9911099, IIS-0208574, IIS-0208434 and ARO-2-5-30267. GLOSSARY Authority page A Web page that is linked from hub pages in a group of pages related to the same topic. Collection fusion A technique that determines how to retrieve documents from multiple collections and merge them into a single ranked list. Database selection The process of selecting potentially useful data sources (databases, search engines, etc.) for each user query. Hub page A Web page with links to important (authority) Web pages all related to the same topic. Metasearch engine A Web-based search tool that utilizes other search engines to retrieve information for its user. PageRank A measure of Web page importance based on how Web pages are linked to each other on the Web. Search engine A Web-based tool that retrieves potentially useful results (Web pages, products, etc.) for each user query. Result merging The process of merging documents retrieved from multiple sources into a single ranked list. Text retrieval A discipline that studies techniques to retrieve relevant text documents from a document collection for each query. Web (World Wide Web) Hyperlinked documents resid- ing on networked computers, allowing users to navigate from one document to any linked document. CROSS REFERENCES See Intelligent Agents; Web Search Fundamentals; Web Site Design. REFERENCES Bergman, M. (2000). The deep Web: Surfacing the hidden value. Retrieved April 25, 2002, from http://www. completeplanet.com/Tutorials/DeepWeb/index.asp Callan, J. (2000). Distributed information retrieval. In W. Bruce Croft (Ed.), Advances in information retrieval: Re- cent research from the Center for Intelligent Information Retrieval (pp. 127–150). Dordrecht, The Netherlands: Kluwer Academic. Callan, J., Connell, M., & Du, A. (1999). Automatic discovery of language models for text databases. In ACM SIGMOD Conference (pp. 479–490). New York: ACM Press. P1: 57 Yu WL040/Bidgolio-Vol I WL040-Sample.cls June 20, 2003 17:52 Char Count= 0 REFERENCES 753 Callan, J., Croft, W., & Harding, S. (1992). The INQUERY retrieval system. In Third DEXA Conference, Valencia, Spain (pp. 78–83). Wien, Austria: Springer-Verlag. Callan, J., Lu, Z., & Croft, W. (1995). Searching distributed collections with inference networks. In ACM SIGIR Conference, Seattle (pp. 21–28). New York: ACM Press. Chakrabarti, S., Dom, B., Raghavan, P., Rajagopalan, S., Gibson, D., Kleinberg, J. (1998). Automatic resource compilation by analyzing hyperlink structure and asso- ciated text. In 7th International World Wide Web Confer- ence, Brisbane, Australia (pp. 65–74). Amsterdam, The Netherlands: Elsevier. Chakrabarti, S., Dom, B., Kumar, R., Raghavan, P., Rajagopalan, S., et al. (1999). Mining the Web’s link structure. IEEE Computer, 32, 60–67. Croft, W. (2000). Combining approaches to information retrieval. In W. Bruce Croft (Ed.), Advances in information retrieval: Recent research from the Center for Intelligent Information Retrieval (pp. 1–36). Dordrecht: Kluwer Academic. Cutler, M., Deng, H., Manicaan, S., & Meng, W. (1999). A new study on using HTML structures to improve retrieval. In Eleventh IEEE Conference on Tools with Artificial Intelligence, Chicago (pp. 406–409). Washing- ton, DC: IEEE Computer Society. Dreilinger, D., & Howe, A. (1997). Experiences with selecting search engines using metasearch. ACM Transactions on Information Systems, 15, 195–222. Fan, Y., & Gauch, S. (1999). Adaptive agents for information gathering from multiple, distributed information sources. In AAAI Symposium on Intelligent Agents in Cyberspace, Stanford University (pp. 40–46). Menlo Park, CA: AAAI Press. Gauch, S., Wang, G., & Gomez, M. (1996). ProFusion: Intelligent fusion from multiple, distributed search engines. Journal of Universal Computer Science, 2, 637– 649. Gravano, L., Chang, C., Garcia-Molina, H., & Paepcke, A. (1997). Starts: Stanford proposal for Internet meta-searching. In ACM SIGMOD Conference, Tucson, AZ (pp. 207–218). New York: ACM Press. Hawking, D., Craswell, N., Bailey, P., & Griffiths, K. (2001). Measuring search engine quality. Journal of Informa- tion Retrieval, 4, 33–59. Hearst, M., & Pedersen, J. (1996). Reexamining the clus- ter hypothesis: Scatter/gather on retrieval results. In ACM SIGIR Conference (pp. 76–84). New York: ACM Press. Kahle, B., & Medlar, A. (1991). An information system for corporate users: Wide area information servers (Tech. Rep. TMC199). Thinking Machine Corporation. Kirsch, S. (1998). The future of Internet search: Infoseek’s experiences searching the Internet. ACM SIGIR Forum, 32, 3–7. New York: ACM Press. Kleinberg, J. (1998). Authoritative sources in a hyperlinked environment. In Ninth ACM-SIAM Symposium on Discrete Algorithms (pp. 668–677). Washington, DC: ACM–SIAM. Koster, M. (1994). ALIWEB: Archie-like indexing in the Web. Computer Networks and ISDN Systems, 27, 175– 182. Lawrence, S., & Lee Giles, C. (1998). Inquirus, the NECi meta search engine. In Seventh International World Wide Web Conference (pp. 95–105). Amsterdam, The Netherlands: Elsevier. Manber, U., & Bigot, P. (1997). The search broker. In USENIX Symposium on Internet Technologies and Systems, Monterey, CA (pp. 231–239). Berkeley, CA: USENIX. Meng, W., Yu, C., & Liu, K. (2002). Building efficient and effective metasearch engines. ACM Computing Surveys, 34, 48–84. Page, L., Brin, S., Motwani, R., & Winograd, T. (1998). The PageRank citation ranking: Bring order to the Web (Technical Report). Stanford, CA: Stanford University. Pratt, W., Hearst, H., & Fagan, L. (1999). A knowledge- based approach to organizing retrieved documents. In Sixteenth National Conference on Artificial Intelligence (pp. 80–85). Menlo Park, CA: AAAI Press and Cam- bridge, MA: MIT Press. Salton, G., & McGill, M. (1983). Introduction to modern information retrieval. New York: McCraw-Hill. Selberg, E., & Etzioni, O. (1997). The MetaCrawler architecture for resource aggregation on the Web. IEEE Expert, 12, 8–14. Wu, Z., Meng, W., Yu, C., & Li, Z. (2001). Towards a highly scalable and effective metasearch engine. In Tenth World Wide Web Conference (pp. 386–395). New York: ACM Press. Yu, C., Meng, W., Liu, L., Wu, W., & Rishe, N. (1999). Efficient and effective metasearch for a large number of text databases. In Eighth ACM International Con- ference on Information and Knowledge Management (pp. 217–214). New York: ACM Press. Yuwono, B., & Lee, D. (1997). Server ranking for distributed text resource systems on the Internet. In Fifth International Conference on Database Systems for Advanced Applications (pp. 391–400). Singapore: World Scientific. P1: JDW Sahai WL040/Bidgolio-Vol I WL040-Sample.cls July 16, 2003 18:35 Char Count= 0 Web Services Web Services Akhil Sahai, Hewlett-Packard Laboratories Sven Graupner, Hewlett-Packard Laboratories Wooyoung Kim, University of Illinois at Urbana-Champaign Introduction 754 The Genesis of Web Services 754 Tightly Coupled Distributed Software Architectures 754 Loosely Coupled Distributed Software Architectures 755 Client Utility 755 Jini 755 TSpaces 755 Convergence of the Two Independent Trends 755 Web Services Today 755 Web Services Description 756 Web Services Discovery 756 Web Services Orchestration 757 Web Services Platforms 758 Security and Web Services 760 Single Sign-On and Digital Passports 760 Payment Systems for Web Services 762 The Future of Web Services 763 Dynamic Web Services Composition and Orchestration 764 Personalized Web Services 764 End-to-End Web Service Interactions 764 Future Web Services Infrastructures 765 Conclusion 766 Glossary 766 Cross References 766 References 766 INTRODUCTION There were two predominant trends in computing over the past decade—(i) a movement from monolithic software to distributed objects and components and (ii) an increasing focus on software for the Internet. Web services (or e-services) are a result of these two trends. Web services are defined as distributed services that are identified by Uniform Resource Identifiers (URI’s), whose interfaces and binding can be defined, described, and discovered by eXtensible Markup Language (XML) artifacts, and that support direct XML message-based interactions with other software applications over the Internet. Web services that perform useful tasks would often exhibit the following properties: Discoverable—The foremost requirement for a Web service to be useful in commercial scenarios is that it be discovered by clients (humans or other Web services). Communicable—Web services adopt a message-driven operational model where they interact with each other and perform specified operations by exchanging XML messages. The operational model is thus referred to as the Document Object Model (DOM). Some of preeminent communication patterns that are being used between Web services are synchronous, asynchronous, and transactional communication. Conversational—Sending a document or invoking a method, and getting a reply are the basic communication primitives in Web services. A sequence of the primitives that are related to each other (thus, conversation) forms a complex interaction between Web services. Secure and Manageable—Properties such as security, re- liability, availability, and fault tolerance are critical for commercial Web services as well as manageability and quality of service. As the Web services gain critical mass in the information technology (IT) industry as well as academia, a dominant computing paradigm of that of software as a monolithic object-oriented application is gradually giving way to software as a service accessible via the Internet. THE GENESIS OF WEB SERVICES Contrary to general public perception, the development of Web services followed a rather modest evolutionary path. The underpinning technologies of Web services borrow heavily from object-based distributed computing and development of the World Wide Web (Berners-Lee, 1996). In the chapter, we review related technologies that help shape the notion of Web services. Tightly Coupled Distributed Software Architectures The study of various aspects of distributed computing can be dated back as early as the invention of time-shared mul- tiprocessing. Despite the early start, distributed computing remained impractical until the introduction of Object Management Group’s (OMG) Common Object Request Broker Architecture (CORBA) and Microsoft’s Distributed Component Object Model (DCOM), a distributed extension to the Component Object Model (COM). Both CORBA and DCOM create an illusion of a single machine over a network of (heterogeneous) computers and allow objects to invoke remote objects as if they were on the same machine, thereby vastly simplifying object sharing among applications. They do so by building their abstrac- tions on more or less OS- and platform-independent middleware layers. In these software architectures, objects define a number of interfaces and advertise their services by registering the interfaces. Objects are assigned identifiers at the time of creation. The identifiers are used for 754 P1: JDW Sahai WL040/Bidgolio-Vol I WL040-Sample.cls July 16, 2003 18:35 Char Count= 0 WEB SERVICES TODAY 755 discovering their interfaces and their implementations. In addition, CORBA supports discovery of objects using descriptions of the services they provide. Sun Microsystems’ Java Remote Method Invocation (Java RMI) provides a similar functionality, where a network of platform-neutral Java virtual machines provides the illusion of a single machine. Java RMI is a language-dependent solution, though the Java Native Interface (JNI) provides language independence to some extent. The software architectures supported by CORBA and DCOM are said tightly coupled because they define their own binary message encoding, and thus objects are inter- operable only with objects defined in the same software architecture; for example, CORBA objects cannot invoke methods on DCOM objects. Also, it is worth noting that security was a secondary concern in these software architectures—although some form of access control is highly desirable—partly because method-level/object-level access control is too fine-grained and incurs too much over- head, and partly because these software architectures were developed for use within the boundary of a single administrative domain, typically a local area network. Loosely Coupled Distributed Software Architectures Proliferation and increased accessibility of diverse intelligent devices in today’s IT market have transformed the World Wide Web to a more dynamic, pervasive environment. The fundamental changes in computing landscape from a static client-server model to a dynamic peer-to-peer model encourage reasoning about interaction with these devices in terms of more abstract notion of service rather than a traditional notion of object. For example, printing can be viewed as a service that a printer provides; printing a document is to invoke the print service on a printer rather than to invoke a method on a proxy object for a printer. Such services tend to be dispersed over a wide area, often crossing administrative boundaries, for better resource utilization. This physical distribution calls for more loosely coupled software architectures where scalable advertising and discovery are a must and low-latency, high-bandwidth interprocessor communication is highly desirable. As a direct consequence, a number of service- centric middleware developments have come to light. We note three distinctive systems from computer industry’s research laboratories, namely, HP’s client utility (e-Speak), Sun Microsystems’ Jini, and IBM’s TSpaces (here listed in the alphabetic order). These have been implemented in Java for platform independence. Client Utility HP’s client utility is a somewhat underpublicized system that became the launching pad for HP’s e-Speak (Karp, 2001). Its architecture represents one of the earlier forms of peer-to-peer system, which is suitable for Web service registration, discovery, and invocation (Kim, Graupner, & Sahai, 2002). The fundamental idea is to abstractly repre- sent every element in computing as a uniform entity called “service (or resource).” Using the abstraction as a building block, it provides facilities for advertising and discovery, dynamic service composition, mediation and management, and capability-based fine-grain security. What dis- tinguishes client utility most from the other systems is the fact that it makes advertisement and discovery visible to clients. Clients can describe their services using vocabu- laries and can specifically state what services they want to discover. Jini The Jini technology at Sun Microsystems is a set of protocol specifications that allows services to announce their presence and discover other services in their vicinity. It ad- vocates a network-centric view of computing. However, it relies on the availability of multicast capability, prac- tically limiting its applicability to services/devices connected with a local area network (such as home network). Jini exploits Java’s code mobility and allows a service to ex- port stub code which implements a communication protocol using Java RMI. Joining, advertisement, and discovery are done transparently from other services. It has been developed mainly for collaboration within a small, trusted workgroup and offers limited security and scalability supports. TSpaces IBM’s TSpaces (TSpaces, 1999) is network middleware that aims to enable communication between applications and devices in a network of heterogeneous computers and operating systems. It is a network communication buffer with database capabilities, which extends Linda’s Tuple space communication model with asynchrony. TSpaces supports hierarchical access control on the Tuple space level. Advertisement and discovery are implicit in TSpaces and provided indirectly through shared Tuple spaces. Convergence of the Two Independent Trends Web services are defined at the cross point of the evolution paths of service-centric computing and the World Wide Web. The idea is to provide service-centric computing by using the Internet as platform; services are delivered over the Internet (or intranet). Since its inception, the World Wide Web has strived to become a distributed, decentralized, all pervasive infrastructure where information is put out for other users to retrieve. It is this decentralized, distributed paradigm of information dissemination that upon meeting the concept of service-centric computing has led to the germination of the concept of Web services. The Web services paradigm has caught the fancy of the research and development community. Many computer scientists and researchers from IT companies as well as universities are working together to define concepts, platforms, and standards that will determine how Web services are created, deployed, registered, discovered, and composed as well as how Web services will interact with each other. WEB SERVICES TODAY Web services are appearing on the Internet in the form of e-business sites and portal sites. For example, P1: JDW Sahai WL040/Bidgolio-Vol I WL040-Sample.cls July 16, 2003 18:35 Char Count= 0 WEB SERVICES756 priceline.com (http://www.priceline.com) and Expedia. com (http://www.expedia.com) act as a broker for airlines, hotels, and car rental companies. They offer through their portal sites statically composed Web services that have prenegotiated an understanding with certain airlines and hotels. These are mostly a business-to-consumer (B2C) kind of Web services. A large number of technologies and platforms have appeared and been standardized so as to enable the paradigm of Web services to support business-to-business (B2B) and B2C scenarios alike in a uniform manner. These standards enable creation and deployment, description, and discovery of Web services, as well as communication amongst them. We describe some preeminent standards below. The Web Services Description Language (WSDL) is a standard to describe service interfaces and publish them together with services’ access points (i.e., bindings) and supported interfaces. Once described in WSDL, Web services can be registered and discovered using the Univer- sal Description, Discovery, and Integration (UDDI). Af- ter having discovered its partners, Web services use the Simple Object Access Protocol (SOAP), which is in fact an incarnation of the Remote Procedure Call (RPC) in XML, over the HyperText Transfer Protocol (HTTP) to exchange XML messages and invoke the partners’ services. Though most services are implemented using platform- independent languages such as Java and C#, development and deployment platforms are also being standardized; J2EE and .NET are two well known ones. Web services and their users often expect different levels of security depending on their security requirements and assump- tion. The primary means for enforcing security are digital signature and strong encryption using the Public Key Infrastructure (PKI). SAML, XKMS, and XACML are some of recently proposed security standards. Also, many secure payment mechanisms have been defined. (See Figure 1). Web Services Description In traditional distributed software architectures, developers use an interface definition language (IDL) to define component interfaces. A component interface typically describes the operations the component supports by specifying their inputs and expected outputs. This enables developers to decouple interfaces from actual implementations. As Web services are envisaged as software accessible through the Web by other Web services and users, .Net UDDII WSDL SOAP J2EE HPPM/ MQSeries Web Methods Figure 1: Web services. Web services need to be described so that their interfaces are decoupled from their implementations. WSDL serves as an IDL for Web services. WSDL enables description of Web services indepen- dently of the message formats and network protocols used. For example, in WSDL a service is described as a set of endpoints. An endpoint is in turn a set of operations. An operation is defined in terms of messages received or sent out by the Web service: Message—An abstract definition of data being communi- cated consisting of message parts. Operation—An abstract definition of an action supported by the service. Operations are of the following types: one-way, request–response, solicit–response, and notification. Port type—An abstract set of operations supported by one or more endpoints. Binding—A concrete protocol and data format specification for a particular port type. Port—A single endpoint defined as a combination of a binding and a network address. Service—A collection of related endpoints. As the implementation of the service changes or evolves over time, the WSDL definitions must be continuously updated and versioning the descriptions done. Web Services Discovery When navigating the Web for information, we use key words to find Web sites of interest through search engines. Often times, useful links in search results are mixed with a lot of unnecessary ones that need to be sifted through. Similarly, Web services need to discover compatible Web services before they undertake business with them. The need for efficient service discovery necessitates some sort of Web services clearing house with which Web services register themselves. UDDI (http://www.uddi.org) supported by Ariba, IBM, Microsoft, and HP, is an ini- tiative to build such a Web service repository; it is now under the auspice of OASIS (http://www.oasis-open.org). These companies maintain public Web-based registries (operator sites) consistent with each other that make available information about businesses and their technical interfaces and application program interfaces (APIs). A core component of the UDDI technology is registration, an XML document defining a business and the Web services it provides. There are three parts to the registration, namely a white page for name, address, contact information, and other identifiers; a yellow page for classification of a business under standard taxonomies; and a green page that contains technical information about the Web services being described. UDDI also lists a set of APIs for publication and inquiry. The inquiry APIs are for browsing information in a repository (e.g., find business, get businessDetail). The publication APIs are for business entities to put their information on a repository. E-marketplaces have been an important development in the business transaction arena on the Internet. They are a virtual meeting place for market participants (i.e., Web services). In addition to the basic registration P1: JDW Sahai WL040/Bidgolio-Vol I WL040-Sample.cls July 16, 2003 18:35 Char Count= 0 WEB SERVICES TODAY 757 and discovery, e-marketplaces offer their participants a number of value-added services, including the following: Enabling inter-Web service interaction after the discovery (the actual interaction may happen with or without the direct participation of the e-marketplace); Enabling supply and demand mechanisms through traditional catalogue purchasing and request for purchase (RFP), or through more dynamic auctions and exchanges; Enabling supply-chain management through collabora- tive planning and inventory handling; and Other value-added services, such as rating, secured payment, financial handling, certification services, and notification services. Thus, e-marketplaces can be developed as an entity that uses public UDDI registries. The e-marketplaces are cat- egorized as vertical and horizontal depending on their target market. The vertical e-marketplaces, such as Ver- ticalNet, GlobalNetXChange, and Retailer Market Ex- change, target a specific industry sector where participants perform B2B transactions. In particular, Chemdex, E-Steel, DirectAg.com, and many more have been successful in their respective markets. By contrast, horizontal exchanges, such as eBay, are directed at a broad range of clients and businesses. Web Services Orchestration By specifying a set of operations in their WSDL document, Web services make visible to the external world a certain subset of internal business processes and activities. There- fore, the internal business processes must be defined and some of their activities linked to the operations before publication of the document. This in turn requires modeling a Web service’s back-end business processes as well as interactions between them. On the other hand, Web services are developed to serve and utilize other Web services. This kind of interaction usually takes a form of a sequence of message exchanges and operation executions, termed conversation. Although conversations are described inde- pendently of the internal flows of the Web services, they result in executions of a set of backend processes. A Web service and its ensuing internal processes together form what is called a global process. Intra-Web Service Modeling and Interaction The Web Services Flow Language (WSFL) (Leymann, 2001), the Web Services Conversation Language (WSFL) (W3C, 2002), the Web Service Choreography Interface (WSCI) (BEA, 2002) and XLANG (Thatte, 2001) are some of many business process specification languages for Web services. WSFL introduces the notion of activities and flows which are useful for describing both local business process flows and global message flows between multiple Web services. WSFL models business processes as a set of activities and links. An activity is a unit of useful work while a link connects two activities. A link can be a control link where a decision of what activity to follow is made, or a data link specifying that a certain datum flows from an activity to another. These activities may be made visible through one or more operations grouped as endpoints. As in WSDL, a set of endpoints defines a service. WSFL defines global message flows in a similar way. A global flow consists of plug links that link up operations of two service providers. Complex services involving more than two service providers are created by recursively defining plug links. XLANG developed by Microsoft extends the XML Schema Definition Language (XSDL) to provide a mechanism for process definition and global flow coordination. The extension elements describe the behavioral aspects of a service. A behavior may span multiple operations. Action is an atomic component of a behavior definition. An action element can be an operation, a delay element, or a raise element. A delay element can be of type de- layFor or delayUntil. delayFor and delayUntil introduce delays in execution for a process to wait for something to happen (for example, a timeout) and to wait till an absolute date-time has been reached, respectively. Raise elements are used to specify exception handling. Excep- tions are handled by invoking the corresponding handler registered with a raise definition. Finally, processes combine actions in different ways: some of them are sequence, switch, while, all, pick, and empty. Inter-Web Service Modeling and Interaction Web services must negotiate and agree on a protocol in order to engage in a business transaction on the Web. X-EDI, ebXML, BTP, TPA-ML, cXML, and CBL have been proposed as an inter-Web service interaction protocol. We focus on ebXML as it is by far the most successful one. (See Figure 2.) In ebXML (http://www.ebxml.org/) parties to engage in a transaction have Collaboration Protocol Profiles (CPP’s) that they register at ebXML registries. A CPP contains the following: Process Specification Layer—Details the business transactions that form the collaboration. It also specifies the order of business transactions. Delivery Channels—Describes a party’s message receiving and sending characteristics. A specification can contain more than one delivery channel. A CB X Y Z Pt .o2 o3 Pt .o5 o7 GF A CB X Y Z P t’.o 2 o 3 P t’. o1 Pt’.o5 o 7 GF P t’. o 2 Figure 2: Intra and inter-Web service modeling and interaction. P1: JDW Sahai WL040/Bidgolio-Vol I WL040-Sample.cls July 16, 2003 18:35 Char Count= 0 WEB SERVICES758 Document Exchange Layer—Deals with processing of the business documents like digital signatures, encryption, and reliable delivery. Transport Layer—Identifies the transport protocols to be used with the endpoint addresses, along with other properties of the transport layer. The transport protocols could be SMTP, HTTP, and FTP. When a party discovers another party’s CPP they negotiate certain agreement and form a Collaboration Pro- tocol Agreement (CPA). The intent of the CPA is not to expose the business process internals of the parties but to make visible only the processes that are involved in interactions between the parties. Message exchange between the parties can be facilitated with the ebXML Mes- saging Service (ebMS). A CPA and the business process specification document it references define a conversation between parties. A typical conversation consists of multiple business transactions which in turn may involve a sequence of message exchanges for requests and replies. Although a CPA may refer to multiple business process specification documents, any conversation is allowed to involve only a single process specification document. Con- ceptually, the B2B servers of parties involved are respon- sible for managing CPAs and for keeping track of the conversations. They also interface the operations defined in a CPA with the corresponding internal business processes. Web Services Platforms Web services platforms are the technologies, means, and methods available to build and operate Web services. Plat- forms have been developed and changed over the course of time. A classification into four generations of platform technology should help to structure the space: First Generation: HTML and CGI—Characterized by Web servers, static HTML pages, HTML FORMS for simple dialogs, and the Common Gateway Interface (CGI) to connect Web servers to application programs, mostly Perl or Shell scripts. (See Figure 3.) Second Generation: Java—Server-side dynamic generation of HTML pages and user session support; the Java servlet interface became popular for connecting to application programs. Third Generation: Application server as Richer development and run-time environments—J2EE as foundation for application servers that later evolved towards the fourth generation. Service A CPP Service B CPP CPA ebXML registry Figure 3: ebXML service-to-service interaction. LB FW AS WS AS WS AS WS AS WS Back-End Internet front-end web server app server back-end DB DB Figure 4: Basic four-tier architecture for Web services. Fourth Generation: Web services—Characterized by the introduction of XML and WSDL interfaces for Web services with SOAP-based messaging. A global service infrastructure for service registration and discovery emerged: UDDI. Dynamic Web services aggregation— Characterized by flow systems, business negotiations, agent technology, etc. Technically, Web services have been built according to a pattern of an n-tier architecture that consists of a front- end tier, firewall (FW), load balancer (LB), a Web-server tier (WS), an application (server) (AS) tier, and a back- end tier for persistent data, or the database tier (DB). (See Figure 4.) First Generation: HTML and CGI The emergence of the World Wide Web facilitated the easy access and decent appearance of linked HTML markup pages in a user’s browser. In the early days, it was mostly static HTML content. Passive information services that provided users with the only capability of navigating though static pages could be built. However, HTML supported from the very beginning FORMS that allowed users to enter text or select from multiple-choice menus. FORMS were treated specially by Web servers. They were passed onto CGI, behind which small applications, mostly Perl or Shell scripts, could read the user’s input, perform respective actions, and return a HTML page that could then be displayed in the user’s browser. This primitive mechanism enabled a first generation of services on the Web beyond pure navigation through static contents. Second Generation: Java With the growth of the Web and the desire for richer services such as online shopping and booking, the initial means to build Web services quickly became too primitive. Java applets also brought graphical interactiveness to the browser side. Java appeared as the language of choice for Web services. Servlets provided a better interface between the Web server and the application. Technology to support dynamic generation of HTML pages at the server side was introduced: JSP (Java Server Pages) by Sun Mi- crosystems, ASP (Active Server Pages) by Microsoft, or PHP pages in the Linux world enabled separation of presentation, the appearance of pages in browsers, from content data. Templates and content were then merged on the fly at the server in order to generate the final page returned to the browser. Since user identification was critical for business services, user log-in and user sessions were introduced. Applications were becoming more complex, and it turned out that there was a significant overlap in common functions needed for many services such as session support, connectivity to persistent databases, and security functions. P1: JDW Sahai WL040/Bidgolio-Vol I WL040-Sample.cls July 16, 2003 18:35 Char Count= 0 WEB SERVICES TODAY 759 Figure 5: The J2EE platform. Third Generation: Application Server The observation that many functions were shared and common among Web services drove the development toward richer development environments based on the Java language and Java libraries. A cornerstone of these environments became J2EE (Java 2 Platform, Enterprise Edition), which is a Java platform designed for enterprise- scale computing. Sun Microsystems (together with industry partners such as IBM) designed J2EE (Figure 5) to simplify application development for Web services by decreasing the need for programming through reusable modular components and by providing standard functions such as session support and database connectivity. J2EE primarily manifests in a set of libraries used by application programs performing the various functions. Web service developers still had to assemble all the pieces, link them together, connect them to the Web server, and manage the various configurations. This led to the emergence of software packages that could be deployed eas- ier on a variety of machines. These packages later became application servers. They significantly reduced the amount of configuration work during service deployment such that service developers could spend more time on business logic and the actual function of the service. Most application server are based on J2EE technology. Exam- ples are IBM’s WebSphere suite, BEA’s WebLogic environment, the Sun ONE Application Framework, and Oracle’s 9i application server. (See Figure 5.) Fourth Generation: Web Services Prior generations of Web services mostly focused on end- users, people accessing services from Web browsers. How- ever, accessing services from services other than browsers turned out to be difficult. This circumstance has prevented the occurrence of Web service aggregation for a long time. Web service aggregation meant that users would only have to contact one Web service, and this service then would resolve the user’s requests with further requests to other Web services. HTML is a language defined for rendering and pre- senting content in Web browsers. It does not allow per se separating content from presentation information. With the advent of XML, XML became the language of choice for Web services for providing interfaces that could not only be accessed by users through Web browsers but also by other services. XML is now pervasively being used in Web services messaging (mainly using SOAP) and for Web service interface descriptions (WSDL). In regard to platforms, XML enhancements were added to J2EE and application servers. The introduction of XML is the major differentiator between Web services platforms of the third and the fourth generation in this classification. A major step toward the service-to-service integration was the introduction of the UDDI service (see the above section Web Services Discovery). Three major platforms for further Web services interaction and integration are: Sun Microsystems’ Sun ONE (Open Net Environment), IBM WebSphere, and Mi- crosoft’s .NET. Sun ONE—Sun’s standards-based software architecture and platform for building and deploying services on demand. Sun ONE’s architecture is built around exis- ting business assets: Data, applications, reports, and transactions, referred to as the DART model. Major standards are supported: XML, SOAP, J2EE, UDDI, LDAP, and ebXML. The architecture is composed of several product lines: the iPlanet Application Frame- work (JATO), Sun’s J2EE application framework for enterprise Web services development, application server, portal server, integration server, directory server, e-commerce components, the Solaris Operating Envi- ronment, and development tools. IBM WebSphere—IBM’s platform to build, deploy, and integrate your e-business, including components such as foundation and tools, reach and user experience, business integration, and transaction servers and tools. Microsoft .NET—Microsoft’s .NET platform for providing lead technology for future distributed applications inherently seen as Web services. With Microsoft .NET, Web services’ application code is built in discrete units, XML Web services, which handle a specified set of tasks. Because standard interfaces based on XML simplify communication among software, XML Web services can be linked together into highly specific applications and experiences. The vision is that the best XML Web services from any provider around the globe can be used to create a needed solution quickly and easily. [...]... whereby neither the Web content nor the programming logic is stored in an HTML file A separate program, stored in the file system, dynamically generates the content The Web server forwards HTTP request information from the client browser to the program using the CGI interface The program processes any relevant user input, generates an HTML Web document and returns the dynamic content to the browser via the. .. References Further Reading 781 782 785 787 788 788 7 89 7 89 7 89 790 790 790 790 or enterprises, for their own use In addition, landlords operating as building LECs (BLECs) may offer LAN services to tenants In either case, the geographic scope of a LAN is usually limited to a building or campus environment where all rights of way for cabling purposes belong to the individual/enterprise/landlord The boundaries... translate between the two The ratio of the aggregate input capacity from all subscriber connections to an edge device to the output capacity from the edge into the core describes the degree of oversubscription For example, if the sum of all access links is 200 Mbps and the core link is 100 Mbps, then the oversubscription ratio is 2:1 A ratio less than or equal to 1 is called nonblocking; the network performance... usability issues in mind is to ensure that the users of the site find it usable and useful Specifically, a Web site should be accessible, appealing, consistent, clear, simple, navigable, and forgiving of user errors (Murray & Costanzo, 199 9) The first step in designing any Web site should be the determination of the purpose of the site Too often the rush to incorporate the latest Web technology or standard... to have the same layout, the more consistent each page looks, the more straightforward it is for the user to navigate through the site and the more distinctive the Web site appears A typical Web page layout utilizes parts or all of an artificially defined border around the content (see Figure 6) Originally, HTML frames or tables were the standard way of laying out a page, and they are still the preferred... initiates HTTP requests to the remote Web server, based on user input The Web server retrieves the particular content specified by the requests and transmits it back to the browser as an HTTP response The Web browser then interprets the response and renders the received HTML content into a user-viewable Web page Web site implementations can be classified by the level of interactivity and the way content is stored,... office area and the link that physically connects from there to the service provider’s point of presence This link is connected to a device at the edge of the service provider’s network, and the edge device is connected to devices that compose the core (also called the backbone) of the service provider’s network Different technologies are often used in the access and core portions, with the edge required... key objective of the protocol design is to minimize the complexity of application implementations by allowing them to become clients and thereby shielded from the complexity and syntax of the underlying Public Key Infrastructure (OASIS PKI Member Section, 2002) used to establish trust relationships-based specifications such as X.5 09/ PKIX, or SPKI (Simple Public Key Infrastructure, 199 9) The X-KRSS specification... occurred along the path in the direction from the source to the destination; one backward-explicit congestion notification (BECN) bit that tells a DTE that congestion occurred along the path in the direction opposite to the transmission from the source to the destination; and one discard-eligibility (DE) bit to indicate whether this is a lower priority frame that may be discarded before others in a congested... independent authorities, as in the Internet These protocols differ in how much information is kept about the state of the network and how routing updates are performed using the mechanisms defined by IP IP Version 4 (IPv4) IP version 4 (IPv4) was defined by the Internet Engineering Task Force (IETF) for the original ARPAnet and published as (Request for Comments) RFC 791 in 198 1 It specifies that each interface . Grants IIS -99 02872, IIS -99 02 792 , EIA -99 11 099 , IIS-0208574, IIS-0208434 and ARO-2-5-30267. GLOSSARY Authority page A Web page that is linked from hub pages in a group of pages related to the same. the database would be carried out using the cached results. This method has the following property (Yu et al., 199 9). If the databases containing the m desired pages are ranked higher than other. 1], to improve the comparability of these local similarities (Dreilinger & Howe, 199 7; Selberg & Etzioni, 199 7). Renormalized similarities can be further adjusted based on the usefulness

Định dạng
Số trang	98
Dung lượng	1,95 MB