2454 Semantic Knowledge Transparency in E-Business Processes Semantic Interoperability. Semantic in- teroperability³LVDG\QDPLFHQWHUSULVHFDSDELOLW\ derivate from the application of special software technologies (such as reasoners, inference engines, ontologies, and models) that infer, relate, and classify the implicit meanings of digital content without human involvement—which in turn drive adaptive business processes, enterprise knowledge, business rules, and software ap- plication interoperability” (Pollock & Hodgson, 2004, p. 6) Semantic Knowledge Transparency. Se- mantic knowledge transparency LVGH¿QHGDV the G\QDPLFRQGHPDQGDQGVHDPOHVVÀRZRIUHOHYDQW and unambiguous, machine-interpretable knowl- edge resources within organizations and across inter-organizational systems of business partners engaged in collaborative processes. TBox. TBox contains intentional knowledge in the form of a terminology and is built through declarations that describe general properties of concepts (Baader et al., 2003; Gomez-Perez et al., 2004). This work was previously published in Semantic Web Technologies and E-Business: Toward the Integrated Virtual Organiza- tion and Business Process Automation, edited by A. Salam and J. Stevens, pp. 255-286, copyright 2007 by IGI Publishing (an imprint of IGI Global). 2455 Copyright © 2009, IGI Global, distributing in print or electronic forms without written permission of IGI Global is prohibited. Chapter 8.7 Enhancing E-Business on the Semantic Web through Automatic Multimedia Representation Manjeet Rege Wayne State University, USA Ming Dong Wayne State University, USA Farshad Fotouhi Wayne State University, USA ABSTRACT With the evolution of the next generation Web— the Semantic Web—e-business can be expected to grow into a more collaborative effort in which businesses compete with each other by collabo- rating to provide the best product to a customer. Electronic collaboration involves data interchange with multimedia data being one of them. Digital multimedia data in various formats have increased tremendously in recent years on the Internet. An automated process that can represent multimedia data in a meaningful way for the Semantic Web is highly desired. In this chapter, we propose an automatic multimedia representation system for the Semantic Web. The proposed system learns DVWDWLVWLFDOPRGHOEDVHGRQWKHGRPDLQVSHFL¿F training data and performs automatic semantic annotation of multimedia data using eXtensible Markup Language (XML) techniques. We dem- onstrate the advantage of annotating multimedia data using XML over the traditional keyword based approaches and discuss how it can help e-business. 2456 Enhancing E-Business on the Semantic Web through Automatic Multimedia Representation INTRODUCTION An Internet user typically conducts separate in- dividual e-business transactions to accomplish a certain task. A tourist visiting New York might purchase airfare tickets and tickets to a concert in New York separately. With the evolution of the Semantic Web, as shown in Figure 1, the user can conduct one collaborative e-business transaction for the two purchases. Moreover, he/she can also take a virtual tour of New York city online, which actually might be a collection of all videos, images, and songs on New York appearing anywhere on the World Wide Web. With the continuing growth and reach of the Web, the multimedia data avail- able on it continue to grow on a daily basis. For a successful collaborative e-business, in addition to other kinds of data, it is important to be able to organize and search the multimedia data for the Semantic Web. With the Semantic Web being the future of the World Wide Web of today, there has to be an HI¿FLHQWZD\ WR UHSUHVHQW WKH PXOWLPHGLD GDWD automatically for it. Multimedia data impose a great challenge to document indexing and retrieval as it is highly unstructured and the semantics are implicit in the content of it. Moreover, most of the multimedia contents appearing on the Web have no description available with it in terms of keywords or captions. From the Semantic Web point of view, this information is crucial because it describes the content of multimedia data and would help represent it in a semantically meaning- ful way. Manual annotation is feasible on a small set of multimedia documents but is not scalable as the number of multimedia documents increases. Hence, performing manual annotation of all Web PXOWLPHGLDGDWDZKLOH³PRYLQJ´WKHPWRWKH6H- mantic Web domain is an impossible task. This we believe is a major challenge in transforming today’s Web multimedia data into tomorrow’s Semantic Web data. In this chapter, we propose a generic auto- matic multimedia representation solution for the Semantic Web—an XML-based (Bray, Paoli, & Sperberg-McQueen, 1998) automatic multimedia representation system. The proposed system is implemented using images as an example and S H U IRU P V G R P D L Q V S H F L ¿FD Q Q RW D W LRQXVL QJ; 0 / 6SHFL¿FDOO\ RXU V\VWHP ³OHDUQV´ IURP DVHWRI GRPDLQVSHFL¿FWUDLQLQJLPDJHVPDGHDYDLODEOHWR it a priori. Upon receiving a new image from the Web that belongs to one of the semantic catego- ries the system has learned, the system generates appropriate XML-based annotation for the new LPDJHPDNLQJLW³UHDG\´IRUWKH6HPDQWLF:HE Although the proposed system has been described from the perspective of images, in general it is Figure 1. Collaborative e-business scenario on the Semantic Web 2457 Enhancing E-Business on the Semantic Web through Automatic Multimedia Representation applicable to many kinds of multimedia data available on the Web today. To our best knowl- edge, there has been no work done on automatic multimedia representation for the Semantic Web using the semantics of XML. The proposed system LVWKH¿UVWZRUNLQWKLVGLUHFWLRQ BACKGROUND The term e-business in general refers to online transactions conducted on the Internet. These are PDLQO\FODVVL¿HGLQWRWZRFDWHJRULHVbusiness- to-consumer (B2C) and business-to-business (B2B). One of the main differences between these two kinds of e-businesses is that B2C, as the name suggests, applies to companies that sell their products or offer services to consumers over the Internet. B2B on the other hand are online transactions conducted between two companies. From its initial introduction in late 1990s, e-busi- ness has grown to include services such as car rentals, health services, movie rentals, and online banking. The Web site CIO.com (2006) reports that North American consumers have spent $172 billion shopping online in 2005, up from $38.8 billion in 2000. Moreover, e-business is expected to grow even more in the coming years. By 2010, consumers are expected to spend $329 billion each y e a r o n l i n e . We e x p e c t t h e e v o l v i n g S e m a n t i c We b WRSOD\DVLJQL¿FDQWUROHLQHQKDQFLQJWKHZD\H business is done today. However, as mentioned in the earlier section, there is a need to represent the multimedia data on the Semantic Web in an HI¿FLHQWZD\,QWKHIROORZLQJVHFWLRQZHUHYLHZ some of the related work done on the topic. Ontology/Schema-Based Approaches Ontology-based approaches have been frequently used for multimedia annotation and retrieval. Hyvonen, Styrman, and Saarela (2002) proposed ontology-based image retrieval and annotation of graduation ceremony images by creating hi- erarchical annotation. They used Protégé (n.d.) DVWKHRQWRORJ\HGLWRUIRUGH¿QLQJWKHRQWRORJ\ and annotating images. Schreiber, Dubbeldam, Wielemaker, and Wielinga (2001) also performed ontology-based annotation of ape photographs, LQ ZKLFK WKH\ XVH WKH VDPH RQWRORJ\ GH¿QLQJ DQG DQQRWDWLRQ WRRO DQG XVH 5HVRXUFH 'H¿QL- tion Framework (RDF) Schema as the output language. Nagao, Shirai, and Squire (2001) have developed a method for associating external an- notations to multimedia data appearing over the Web. Particularly, they discuss video annotation by performing automatic segmentation of video, semiautomatic linking of video segments, and interactive naming of people and objects in video frames. More recently, Rege, Dong, Fotouhi, Sia- dat, and Zamorano (2005) proposed to annotate human brain images using XML by following the MPEG-7 (Manjunath, 2002) multimedia standard. The advantages of using XML to store meta-information (such as patient name, surgery location, etc.), as well as brain anatomical infor- mation, has been demonstrated in a neurosurgical domain. The major drawback of the approaches, mentioned previously, is that the image annotation is performed manually. There is an extra effort needed from the user’s side in creating the ontol- ogy and performing the detailed annotation. It is highly desirable to have a system that performs automatic semantic annotation of multimedia data on the Internet. Keyword-Based Annotations Automatic image annotation using keywords has recently received extensive attention in the research community. Mori, Takahashi, and Oka (1999) developed a co-occurrence model, in which they looked at the co-occurrence of keywords with image regions. Duygulu, Barnard, Freitas, and Forsyth (2002) proposed a method to describe images using a vocabulary of blobs. First, regions are created using a segmentation algorithm. For 2458 Enhancing E-Business on the Semantic Web through Automatic Multimedia Representation each region, features are computed and then blobs are generated by clustering the image features for these regions across images. Finally, a translation model translates the set of blobs of an image to a set of keywords. Jeon, Lavrenko, and Man- matha (2003) introduced a cross-media relevance model that learns the joint distribution of a set of regions and a set of keywords rather than the cor- respondence between a single region and a single keyword. Feng, Manmatha, and Lavrenko (2004) proposed a method of automatic annotation by partitioning each image into a set of rectangular regions. The joint distribution of the keyword annotations and low-level features is computed from the training set and used to annotate test- ing images. High annotation accuracy has been reported. The readers are referred to Barnard, Duygulu, Freitas, and Forsyth (2003) for a com- p r e h e n s i ve revie w o n t h i s t o p i c . A s we poi n t o u t i n WKHVHFWLRQ³;0/%DVHG$QQRWDWLRQ´NH\ZRUG annotations do not fully express the semantic meaning embedded in the multimedia data. In this paper, we propose an Automatic Multimedia Representation System for the Semantic Web using the semantics of XML, which enables ef- ¿FLHQWPX OW LPH GLDDQQRWDWLRQD QGUHW U LHYDOE DVHG on the domain knowledge. The proposed work is WKH¿UVWDWWHPSWLQWKLVGLUHFWLRQ PROPOSED FRAMEWORK In order to represent multimedia data for the Semantic Web, we propose to perform automatic multimedia annotation using XML techniques. Though the proposed framework is applicable to multimedia data in general, we provide details about the framework using image annotations as a case study. XML-Based Annotation $QQRWDWLRQVDUHGRPDLQVSHFL¿FVHPDQWLFLQIRU- mation assigned with the help of a domain expert to semantically enrich the data. The traditional approach practiced by image repository librarians i s t o a n n o t a t e e a c h i m a g e m a n u a l l y w i t h k e y w o r d s or captions and then search on those captions or keywords using a conventional text search engine. The rationale here is that the keywords capture the semantic content of the image and help in retrieving the images. This technique is also used by television news organizations to retrieve ¿OHIRRWDJHIURPWKHLUYLGHRV6XFKWHFKQLTXHV DOORZWH[WTXHULHVDQGDUHVXFFHVVIXOLQ¿QGLQJ the relevant pictures. The main disadvantage with PDQXDODQQRWDWLRQVLVWKHFRVWDQGGLI¿FXOW\RI scaling it to large numbers of images. MPEG-7 (Manjunath, 2002, p. 8) describes WKHFRQWHQW²³WKHELWVDERXWWKHELWV´²RIDPXO- WLPHGLD¿OHVXFKDVDQLPDJHRUDYLGHRFOLS7KH MPEG-7 standard has been developed after many r o u n d s o f c a r e f u l d i s c u s s i o n . I t i s e x p e c t e d t h a t t h i s standard would be used in searching and retrieving for all types of media objects. It proposes to store low-level image features, annotations, and other PHWDLQIRUPDWLRQLQRQH;0/¿OHWKDWFRQWDLQV a reference to the location of the corresponding LPDJH¿OH;0/KDVEURXJKWJUHDWIHDWXUHVDQG promising prospects to the future of the Semantic Web and will continue to play an important role in its development. XML keeps content, structure, and representation apart and is a much more adequate means for knowledge representation. It can represent semantic properties through its syntactic structure, that is, by the nesting or se- quentially ordering relationship among elements (XML tags). The advantage of annotating mul- timedia using XML can best be explained with the help of an example. Suppose we have a New York image (shown in Figure 2) with keywords annotation of St at ue of Liber ty, Sea, Clouds, Sk y. Instead of simply using keywords as annotation for this image, consider now that the same image is represented in an XML format. Note that the XML representation of the im- DJHFDQFRQIRUPWRDQ\GRPDLQVSHFL¿F;0/ schema. For the sake of illustration, consider 2459 Enhancing E-Business on the Semantic Web through Automatic Multimedia Representation the XML schema and the corresponding XML representation of the image shown in Figure 3. This XML schema stores foreground and background object information along with other meta-information with keywords along various SDWKVRIWKH;0/¿OH&RPSDUHGZLWKNH\ZRUG based approaches, the XML paths from the root node to the keywords are able to fully express the semantic meaning of the multimedia data. In the case of the New York image, semanti- cally meaningful XML annotations would be ³LPDJHVHPDQWLFIRUHJURXQGREMHFW 6WDWXH RI Liberty, image/semantic/foreground/ object = Sea, image/semantic/ background/ object = Sky, image/semantic/background /object =Clouds”. The semantics in XML paths provides us with an added advantage by differentiating the objects in the foreground and background and giving more meaningful annotation. We emphasize that the annotation performed XVLQJRXUDSSURDFKLVGRPDLQVSHFL¿FNQRZOHGJH The same image can have different annotation under a different XML schema that highlights certain semantic characteristics of importance pertaining to that domain knowledge. We simply use the schema of Figure 3 that presents image foreground and background object information as a running example. Overview of System Architecture The goal of the proposed system is to represent multimedia data obtained from the Web in a meaningful XML format. Consequently, this GDWDFDQEH³PRYHG´WRWKH6HPDQWLF:HELQDQ DXWRPDWLFDQGHI¿FLHQWZD\)RUH[DPSOHDVVKRZQ LQ)LJXUHWKHV\VWHP¿UVWUHFHLYHVDQLPDJH from the Web. The image could be received by a Web image provider which is an independent module outside of the system that simply fetches GRPDLQVSHFL¿FLPDJHVIURPWKH:HEDQGSDVVHV them onto our system. The Web image provider FRXOGDOVREHD³:HEVSLGHU´WKDW³FUDZOV´DPRQJ GRPDLQVSHFL¿F:HEGDWDVRXUFHVDQGSURFXUHV relevant images. The image is then preprocessed by two other modules, namely, image divider and feature extractor. An image usually contains sev- eral regions. Extracting low-level features from GLIIHUHQWLPDJHUHJLRQVLVW\SLFDOO\WKH¿UVWVWHS of automatic image annotation since regions may have different contents and represent different semantic meaning. The image regions could be determined through either image segmentation (Shi & Malik, 1997) or image cutting in the image divider. For low-level feature extraction, we used some of the features standardized by MPEG-7. The low-level features extracted from all the regions are passed on to the automatic annotator. This module learns a statistical model that links image regions and XML annotation paths from a VHWRIGRPDLQVSHFL¿FWUDLQLQJLPDJHV7KHWUDLQ- ing image database can contain images belonging to various semantic categories represented and annotated in XML format. The annotator learns to annotate new images that belong to at least one of the many semantic categories that the Figure 2. Comparison of keyword annotation and XML-path-based annotation 2460 Enhancing E-Business on the Semantic Web through Automatic Multimedia Representation Figure 3. An example of an XML schema and the corresponding XML representation of an image Figure 4. System architecture 2461 Enhancing E-Business on the Semantic Web through Automatic Multimedia Representation annotator has been trained on. The output of the automatic annotator is an XML representation of the image. Statistical Model for Automatic Annotation In general, image segmentation is a computation- ally expensive as well as an erroneous task (Feng et al., 2004). As an alternative simple solution, we have the image divider partition each image into a set of rectangular regions of equal sizes. The feature extractor extracts low-level features from each rectangular region of every image and constructs a feature vector. By learning the joint probability distribution of XML annotation paths and low-level image features, we perform the automatic annotation of a new image. Let X denote the set of XML annotation paths, 7GHQRWHWKHGRPDLQVSHFL¿FWUDLQLQJLPDJHVLQ XML format, and let t be an image belonging to T. Let x t be a subset of X containing the annota- tion paths for t. Also, assume that each image is divided into n rectangular regions of equal size. Consider a new image q not in the training set. Let f q = {f q1 , f q2 ,… f qn } denote the feature vector for q. In order to perform automatic an- notation of q, we model the joint probability of f q and any arbitrary annotation path subset x of X as follows, P(x, f q ) = P(x, f q1 , f q2 , … , f qn ) (1) We use the training set T of annotated images to estimate the joint probability of observing x and {f q1 , f q2, …… , f qn } by computing the expectation over all the images in the training set. P(x, f q1 , f q2 , …., f qn ) = P(t) P(x, f q1 , f q2 , … , f qn | t ) (2) We assume that the events of observing x and f q1, f q2, f qn are mutually independent of each other and express the joint probability in terms of P A , P B and P C as follows: P(x, f q1 , f q2 , … f qn ) = { P A (t) P B (f a |t) P C (path|t) (1 – P C (path|t)) } (3) where P A is the prior probability of selecting each training image, P B is the density function respon- sible for modeling the feature vectors, and P C is a multiple Bernoulli distribution for modeling the XML annotation paths. In the absence of any prior knowledge of the training set, we assume that P A follows a uniform prior and can be expressed as: P A = |||| 1 T (4) where ||T|| is the size of the training set. For the distribution P B , we use a nonparametric, kernel- based density estimate: P B (f |t) = ¦ ¦ ¦ i kk i T i ffff n ||2 )}()(exp{ 1 1 (5) where f i belongs to {f 1 ,f 2 ,…,f n } the set of all low- level features computed for each rectangular region of ¦image t. is the diagonal covariance matrix which is constructed empirically for best annotation performance. In the XML representation of images, every annotation path can either occur or might not occur at all for an image. Moreover, as we an- notate images based on object presence and not on prominence in an image, an annotation path if it occurs can occur—at most—once in the XML representation of the image. As a result, it is reasonable to assume that the density func- tion P C follows a multiple Bernoulli distribution as follows: P C (path|t) = ||)||( )( , T N pathtpath (6) where J is a smoothing parameter, if the path occurs in the annotation of image t, else it is zero. 2462 Enhancing E-Business on the Semantic Web through Automatic Multimedia Representation N path is the total number of training images that contain this path in their annotation. EXPERIMENTAL RESULTS Our image database contains 1,500 images ob- tained from the Corel data set, comprising 15 image categories with 100 images in each cat- egory. The Corel image data set contains images from different semantic categories with keyword annotations performed by Corel employees. In order to conduct our experiments, we require a training image database representing images in ;0/IRUPDW(DFK;0/¿OHVKRXOGFRQWDLQDQ- notation, low-level features, and other meta-infor- mation stored along different XML paths. In the absence of such a publicly available data, we had to manually convert each image in the database to an XML format conforming to the schema shown in Figure 3. We performed our experiments on ¿YH UDQGRPO\ VHOHFWHG LPDJH FDWHJRULHV (DFK image category represents a distinct semantic concept. In the experiments, 70% of the data are randomly selected as the training set while the remaining were used for testing. Automatic Annotation Results Given a test image, we calculate the joint prob- ability of the low-level feature vector and the XML annotation paths in the training set. We select the top four paths with the highest joint probability as the annotation for the image. Compared with other approaches in image annotation (Duygulu et al., 2002; Feng et al., 2004), our annotation results provide more meaningful description of a given image. Figure 5 shows some examples of our anno- tation results. We can clearly see that the XML- path-based annotation contains richer semantic meaning than the original keyword provided by Corel. We evaluate the image annotation perfor- mance in terms of recall and precision. The recall and precision for every annotation path in the test set is computed as follows: recall = r q precision = s q where q is the number of images correctly an- notated by an annotation path, r is the number Figure 5. Examples of top annotation in comparison with Corel keyword annotation 2463 Enhancing E-Business on the Semantic Web through Automatic Multimedia Representation of images having that annotation path in the test set, and s is the number of images annotated by the same path. In Table 1 we report the results for all the 148 paths in the test set as well as the 23 best paths as in Duygulu et al. (2002) and Feng et al. (2004). Retrieval Results *LYHQVSHFL¿FTXHU\FULWHULD;0/UHSUHVHQWD- WLRQKHOSVLQHI¿FLHQWUHWULHYDORILPDJHVRYHU WKH6HPDQWLF:HE6XSSRVHDXVHUZDQWVWR¿QG images that have an airplane in the background and people in the foreground. State-of-the-art search engines require the user to supply indi- YLGXDO NH\ZRUGV VXFK DV ³DLUSODQH´ ³SHRSOH´ and so forth or any combination of keywords as a query. The union of the retrieved images of all possible combinations of the aforementioned quer y keywords is sure to have images satisfying WKHXVHUVSHFL¿HGFULWHULD However, a typical search engine user search- LQJIRULPDJHVLVXQOLNHO\WRYLHZEH\RQGWKH¿UVW Table 1. Annotation results Number of Paths with recall > 0 is 50 Annotation Results Results on all 148 paths Results on top 23 paths Mean per-path recall 0.22 0.83 Mean per-path precision 0.21 0.73 Figure 6. Ranked retrieval for the query image/semantic/background/object =“sky” 15-20 retrievals, which may be irrelevant in this case. As a result, the user query in this scenario is unanswered in spite of images satisfying the VSHFL¿HGFULWHULDEHLQJSUHVHQWRQWKH:HE:LWK the proposed framework, the query could be an- VZHUHGLQDQHI¿FLHQWZD\6LQFHDOOWKHLPDJHV on the Semantic Web are represented in an XML format, we can use XML querying technologies such as XQuery (Chamberlin, Florescu, Robie, Simeon, & Stefanascu, 2001) and XPath (Clark & DeRose, 1999) to retrieve images for the query ³LPDJHVHPDQWLFEDFNJURXQGREMHFW SODQH image/semantic/foreground/object = people”. This is unachievable with keyword-based queries and hence is a major contribution of the proposed work. Figure 6 shows some examples of the retrieval results. In Table 2, we also report the mean aver- age precision obtained for ranked retrieval as in Feng et al. (2004). 6LQFHWKHSURSRVHGZRUNLVWKH¿UVWRQHRILWV kind to automatically annotate images using XML p a t h s , w e w e r e u n a b l e t o m a k e a d i r e c t c o m p a r i s o n . published in Semantic Web Technologies and E -Business: Toward the Integrated Virtual Organiza- tion and Business Process Automation, edited by A. Salam and J. Stevens, pp. 255-286, copyright. been frequently used for multimedia annotation and retrieval. Hyvonen, Styrman, and Saarela (2002) proposed ontology-based image retrieval and annotation of graduation ceremony images by. semiautomatic linking of video segments, and interactive naming of people and objects in video frames. More recently, Rege, Dong, Fotouhi, Sia- dat, and Zamorano (2005) proposed to annotate