Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống
1
/ 36 trang
THÔNG TIN TÀI LIỆU
Thông tin cơ bản
Định dạng
Số trang
36
Dung lượng
5,04 MB
Nội dung
359 CHAPTER 15 ContentManagementSystems O NE OF THE MOST COMMON uses of a portal is to provide an interface to contentmanagementsystems (CMSs). Some users may need to get information from the CMS, while others may need to create content. Many portals integrate with a CMS from the same vendor—sometimes the portal ships with the CMS, and in other cases it is a separate product. If you do not have a vendor-supplied integrated solu- tion, you will probably need to develop one using the portlet API and a CMS API. In this chapter, we discuss the Java Content Repository API (JSR 170), and the WebDAV protocol. We also build a portlet that uses WebDAV to connect to a content store—in this case, the open source CMS Apache Slide ( http://jakarta.apache.org/ slide ). Our portlet should work with any WebDAV server, so you can use your own CMS if it supports WebDAV. Overview of ContentManagementSystemsContentmanagement is a broad field that encompasses a wide range of software applications. Document management, imaging, product data management, digital media and asset management, knowledge management, and web content man- agement are some of the different types of contentmanagement systems. Usually, all of these different systems are grouped together into a field called enterprise content management. From a technical perspective, many of these systems share a common base of functionality and features. All of them have a content repository, where content is stored on a database or file system. Most systems use some kind of hierarchical organization for the content, although you will certainly find CMS applications where all of the content is at the same level. Most CMS packages with a hierarchical view actually store all of the content in a single database table or directory. The relationships for the hierarchy are stored in the database. This provides advantages for access and retrieval, and allows the same piece of content to appear in two or more different locations. The next piece of the CMS puzzle is content delivery. Most web content man- agement packages are optimized for content delivery, and can easily be plugged 2840ch15.qxd 7/13/04 12:44 PM Page 359 Download at Boykma.Com Chapter 15 360 into a web-based application. In some cases, part of the vendor-provided content delivery is a display portlet that can save you a lot of development effort. One disadvantage of CMS tools with content delivery is that they often include page assembly features, for delivering a web page with navigation, headers, and footers. This is not very useful from a portal perspective, where the portal page provides the interface. Usually there is a way to access the raw content directly, without any of the page assembly. A common use of contentmanagementsystems is to introduce workflow into the content production process. A classic business use case for a CMS proj- ect involves a content producer creating a Microsoft Word file locally and then uploading it into the CMS. His manager gets a review notice through e-mail and logs into the CMS. The manager approves the content inside the CMS, and the content is ready for delivery. Most enterprise CMS applications will have workflow or an approval process for publishing built-in. The level of automation and custom development varies from CMS to CMS. Sometimes it can be very easy to create a complex content review process that turns out to be unwieldy for the end user in practice. Bottlenecks will start to appear, especially with different levels of approval. Creating a “ready for review” portlet for a CMS is usually a straightfor- ward development project involving a proprietary API. Content production and authoring is a newer technology that has become more popular with the availability of rich text or HTML authoring controls for web pages. These content-creation tools could be Java applets, ActiveX controls, DHTML, Flash, or any other client-side technology. Some CMS applications come with these as part of an authoring workspace. The rapid adoption of WebDAV in desk- top applications means that these controls may not be the best solution for your users, especially if they are already familiar with tools like Macromedia Dreamweaver MX. It is easy enough to embed one of the client-side HTML author- ing tools into a portlet—saving the HTML onto the server will depend on the CMS. Personalization is a feature that somewhat overlaps with portals. Your contentmanagement system may support varying levels of personalization, some of which may coincide with your portal vendor’s personalization product. If you have to choose between the two, the portal’s personalization will work for other applica- tions running on the portal, but the CMS personalization will be portable across multiple portals. Ideally, future versions of the portlet API will standardize person- alization, so this will become less of a problem. Almost every CMS includes some level of search support, whether it is a simple SQL query interface or an integrated search engine like Verity or Lucene. The Java Content Repository API defines a standard for queries and query languages that should gain support from Java-based CMS vendors. The trickiest part of external search engine integration with a CMS and a portal will be indexing the CMS prop- erly. If your site includes multiple user groups with different access to content, you should consider a federated search approach, as we described in Chapter 10 for Lucene. Commercial search vendors will have their own recommendations, and will probably offer ready-made JSR 168 portlets either now or in the near future. 2840ch15.qxd 7/13/04 12:44 PM Page 360 Download at Boykma.Com ContentManagementSystems 361 Integration with a ContentManagement System Most portal deployments require integration with at least one contentmanagement system; often, integration with several different vendors’ systems is necessary. From a project management perspective, bringing content into a portal requires several steps. The first is to identify which content should be available and where the content is coming from. The next step is to determine which sets of users should see which content. The third step involves identifying which functionality in the contentmanagement system belongs in a portlet. After these business process steps are completed, you can start planning the technical architecture of the integration—does the vendor provide a JSR 168 portlet already? Many vendors write portlets for their contentmanagement systems, which can make your job much easier. Two commercial vendors with JSR 168 portlets at the time of writing are Stellent and Documentum; other vendors likely have products on the way. If you do not have a ready-made portlet application to roll into your application, you are going to need to look into the integration APIs for the content manage- ment system. There are two major standards for CMS APIs: WebDAV and the new Java Content Repository API (JSR 170). WebDAV is a set of extensions to the HTTP protocol for versioning, accessing metadata, making directories, locking files, and checking files in and out, among other things. WebDAV is not tied to a single platform or architecture, although the CMS must specifically implement a WebDAV layer. The Java Content Repository API (JCR API) is a new standard for Java contentmanagement systems. The JCR API defines a standard set of interfaces and classes that CMS clients can use to connect to a CMS and access content and metadata. We discuss both WebDAV and the JCR API in this chapter. Neither of these APIs covers all of the possible functionality for a CMS. In addition, not every CMS implements one of these APIs—most will have a separate proprietary API, which you will have to implement yourself. If there are any servlet/JSP example applications, they should be easily adapted to a portlet application. You can pull content out of almost any contentmanagement system through its database or file system store, but that should be a last-ditch integration step. Of course, if your CMS is 10 years old, running on a legacy platform, and does not have an open API, this may be your only choice. It is probably better at that point to migrate the legacy CMS to something newer, but for lots of reasons that may not make business sense. Common Problems with CMS and Portals Some of the most common technical problems with CMS integration with portals are authentication, access control, link rewriting, and content delivery. We can manage authentication with Single Sign-On (SSO) functionality, which we discussed 2840ch15.qxd 7/13/04 12:44 PM Page 361 Download at Boykma.Com Chapter 15 362 in Chapter 8. If the application is not suitable for SSO, you can collect the correct CMS credentials from the user once, and then store them in the user’s portlet preferences. Access control partly comes from SSO, especially if you have an enterprise-wide set of permissions for your portal and your CMS. If all of your access control is maintained in one directory, you can cut down on technical support, but your software development costs for integration will be huge. If you do not have an enterprise access control system, most contentmanagementsystems will only display content that the user has access to. You will have to manage the permis- sions yourself, either programmatically or through an administrative GUI. Link rewriting is another common problem. The links in your CMS content will not stay in the portal. You could write a set of content display adapters that rewrites your HTML content with the appropriate portlet URLs for links. Another approach would be to standardize on an enterprise-wide XML format for content. Each content delivery or content authoring system would be responsible for ren- dering the XML correctly for display in that system, but creating the correct links would be easy. Any content that relies on JavaScript will probably not work, unless the JavaScript is completely contained in the piece of content. Because you may want to use the content in more than one location, you probably do not want JavaScript embedded in your content. Convince your content producers that they do not need to use scripts—one way to encourage this is to provide support for custom HTML or XML tags, such as <PrinterFriendly> , <PopupWindow> , <DynamicMenu> , or similar tags. Your content delivery applications would render these tags in the appropriate manner for display, or ignore them altogether. This puts more control on the systems side, and takes control away from the content creators. Portals especially need this type of control over content because the content needs to appear in a portlet. You will have to determine how content delivery through a portal will work. You could display all of your content in new browser windows that open up out- side of the portlet window. If you take this path, your CMS portlets would open links to web applications that display the content correctly, with working links, styles, and images. Another approach is to display HTML or XML content inside the portlet, and rewrite the links to any binary data such as PDF files or images to use a servlet for access. Your portlet cannot stream binary data to the user’s web browser directly, so an approach like this is necessary. You could also look into ActiveX controls for PDF files, Microsoft Office files, and the like. Java Content Repository API (JSR 170) The Java Content Repository API (JCR API, www.jcp.org/en/jsr/detail?id=170 ) is a common interface to contentmanagement systems, just like the portlet API is a common interface for portals. The JCR API is Java Specification Request 170 2840ch15.qxd 7/13/04 12:44 PM Page 362 Download at Boykma.Com ContentManagementSystems 363 (JSR 170), and at the time of this writing, it was in public review. Similar to the portlet API, the motivation for the JSR 170 standard was that each CMS vendor used a different API. Writing applications on top of these proprietary APIs was difficult because the application ran only on one CMS or because porting and maintaining compatibility required lots of development resources. Imagine try- ing to build an application (for instance, a search engine) that ran on a number of portals and used several different contentmanagement systems. Then imagine supporting that application for all the combinations of systems your customers might have. The advantage of the JCR API is that more applications can take advantage of contentmanagement systems—the barrier to entry is lower, and there is less worry about proprietary lock-in. A client application does not have to know the details of how the JSR 170 implementation works on the contentmanagement system. The JCR API does not specify a client/server protocol. Because some CMSs organize content in a hierarchy of folders and content items, and others organize content in a flat set, the JCR API can use either type of structure. The JCR API does not cover all of the possible functions of a content man- agement system. The standard covers the most common functionality for a content repository, but does not include such areas as personalization, pub- lishing, workflow, or taxonomies. There are two levels of the JCR API. The first level is Level 1, and it includes basic content repository functionality. The main features it includes are • Retrieving content • Writing content • Removing content • Serializing content • Searching content • Changing and retrieving different content types The more advanced functionality is grouped into Level 2. Level 2 is not required because not every CMS needs that level of complexity. The advanced features in Level 2 are • Transactions • Versions • Observation 2840ch15.qxd 7/13/04 12:44 PM Page 363 Download at Boykma.Com Chapter 15 364 • Access control • Locks JCR API Concepts The JCR API uses several key classes for access to most of the content repository functionality. These classes model the contentmanagement system’s internal structure, but building this API on top of a legacy CMS may be difficult. Some of the JCR API classes may not be a one-to-one match for existing classes, and some of the APIs and key concepts might be implemented differently in the CMS. You should understand how the concepts described next map onto your CMS, especially noting which functionality is unavailable through the JCR API. Repository The javax.jcr.Repository interface models a Java content repository. The reposi- tory represents all of the content, relationships, and metadata in the contentmanagement system. The content repository contains content workspaces. Your portlets will ask this class for a ticket that represents access to a workspace for an authenticated user. The repository will need a valid set of credentials for the user. Ticket Tickets map authenticated users to workspaces. Ticket objects implement the javax.jcr.Ticket interface. A ticket maps to a single workspace. Each ticket pro- vides access to the repository for the user, but the ticket will keep any changes queued until the portlet either reverts or saves the changes. Credentials The javax.jcr.Credentials interface represents the user authentication information for the user. If the credentials are valid, the repository will return a valid ticket that grants access to a workspace. Your CMS will implement the interface with whatever information it needs to grant access—this will usually include a user- name and password, and could include a group or a domain, or other custom authentication attributes. 2840ch15.qxd 7/13/04 12:44 PM Page 364 Download at Boykma.Com ContentManagementSystems 365 Workspace Use the javax.jcr.Workspace interface to get access to a content workspace. The repository holds one or more workspaces. Each workspace has a tree of items, which are organized under a root node. Each Workspace object for an authenticated user maps to a Ticket object. Item The javax.jcr.Item interface is the base class for nodes and properties. The workspace consists of a tree of items. Node Nodes represent individual entities in the contentmanagement system, and are implementations of the javax.jcr.Node interface. They could be pieces of content, folders, documents, products, or anything else. Each node can have zero or more child nodes. With CMS support, nodes may also have more than one parent node. Nodes can have zero or more properties. Each node has one primary node type, but can also have multiple mixin node types. Mixin types describe additional information about a node, beyond its pri- mary node type. Each primary node type inherits from the nt:base node type, which must be supported. The CMS may define its own node types below the hierarchy. Some predefined (but optional) node types are nt:file , nt:folder , nt:version , and nt:query . Certain primary node types require mixin types, and others allow only certain mixin types. Nodes can have versions, although the node must have the mixin node type mix:versionable . Property Properties are children of nodes, and have only one parent node. The property interface is javax.jcr.Property . Properties represent pieces of metadata about nodes. The values of properties must conform to allowed property types, which include strings, binary data, dates, longs, doubles, and booleans. Properties may also be soft links or references. Soft links are links to paths in the content repository. These are soft references; the linked content may be moved, deleted, or may not even exist. The soft link’s path can be absolute or relative. References are hard links to nodes. They link by the node ID (UUID), and they must exist. If a reference 2840ch15.qxd 7/13/04 12:44 PM Page 365 Download at Boykma.Com Chapter 15 366 exists to a node, that reference must be deleted before the node may be moved or deleted. Some properties may have multiple values. Path A path points to an item in the repository. Paths may be either relative or absolute. /Engineering/Reports/11222.doc is an example of an absolute path in the repos- itory. /Reports/11222.doc is a relative path, just like a file system. Your portlet may get a node through the ticket by its absolute path. If two or more nodes under the same parent node have the same name, the path can be tricky. You will have to use array-based notation (starting at 1, not 0) to reference the node you want. Search One of the most interesting features of the JCR API is its search support. Because JCR defines a standard set of query interfaces, it should be possible to create a search portlet that can execute a search and display search results for any contentmanagement system. Compare that with the Lucene portlet we built in Chapter 10. The Lucene portlet’s basics are the same for every CMS, especially if you use a standard set of fields. The difficulty with Lucene is writing classes that synchronize the contents of the CMS with the Lucene index, especially if you are integrating multiple systems. We expect that many JCR API implementations will use Lucene to provide search capabilities. The JCR API defines two query languages for the search function: • JCRQL (with SSES): Java Content Repository Query Language (with Simple Search Engine Syntax) is similar to SQL, but has extensions for the hierarchical content model and also supports standard search query terms. • XPath: XPath 2.0 is an XML technology for searching through a hierarchical XML document and extracting elements that match an XPath expression. The JCR API XPath query language supports a subset of the XPath 2.0 func- tionality plus some extensions needed for the JCR API. Each content repository has to support at least one of these query languages. Each CMS can also support additional languages—for instance, a Google-style query language, or a Lucene query language with named fields. This means your application will need to know which query language the CMS supports. The javax.jcr.query.QueryManager class has a getSupportedQueryLanguages() method 2840ch15.qxd 7/13/04 12:44 PM Page 366 Download at Boykma.Com ContentManagementSystems 367 that will return the supported languages. If you are building a general-purpose application, you will probably need to support both of the standard query lan- guages. This way, your application will run on any JCR API–compliant CMS. Your support may just be limited to different help files for the search engine because the QueryManager class also parses the query from the user’s statement. Development with the JCR API The JCR API classes belong to the javax.jcr package and its subpackages. To start developing with the JCR API, you will need to select and install a server that implements the standard. The standard is still quite new, so we expect that a ref- erence implementation of the JCR API will be released around the time that this book is published. Some of the details of the API may have changed since the pub- lic review, but all of the major concepts should be the same. The first step with the JCR API is to obtain a javax.jcr.Repository object. Your contentmanagement system should include directions for getting an instance of Repository , because this is one area of the API that is not standardized. The authors of the specification expect that a JNDI lookup will be a common approach. Repository is an interface with one method, login() : public Ticket login(Credentials credentials, String workspaceName) throws LoginException, NoSuchWorkspaceException The login() method takes a set of credentials and a workspace name. The javax.jcr.Credentials interface consists of a getUserId() method; a getPassword() method; and several methods for storing, setting, and removing attributes on the credentials. The JCR API provides a basic implementation of the Credentials inter- face with the javax.jcr.SimpleCredentials class. You can create a new instance of SimpleCredentials by calling its constructor and passing a user ID and password as arguments. Upon successful authentication, the login() method returns a Ticket object. The javax.jcr.Ticket class is the main gateway for your client to access the content repository. From the ticket, you can get the root node of the workspace, or you can get a node by its absolute path. You can also import an XML document that represents new items. Once you have a node, you can continue traversing the tree by relative paths. The Node class has methods for retrieving and setting the node’s properties. You can also create new nodes or add existing nodes as children. After you make any changes, you will have commit your changes by saving the node. You can also save all of your changes for the workspace by calling the save() method on the ticket. 2840ch15.qxd 7/13/04 12:44 PM Page 367 Download at Boykma.Com Chapter 15 368 Retrieving a document out of a content repository with the JCR API is simple. When you have a node with the primary type nt:file , that node will have a child node called jcr:content . The jcr:content node holds the content in one of its properties, which could be called data . You could get the value of the data property, and then pass it back through to the portlet. WebDAV WebDAV ( www.webdav.org ) is a commonly implemented protocol for connecting to contentmanagementsystems and other content stores. The WebDAV specification (RFC 2518) can be found at www.webdav.org/specs/rfc2518.htm . Many applications and operating systems are WebDAV compatible. A non-exhaustive list of compati- ble client applications follows: • Microsoft Word • Microsoft Excel • Adobe Photoshop • Macromedia Dreamweaver • Apple Mac OS X • Microsoft Windows XP • Altova XML Spy 2004 All of these applications are able to connect to a WebDAV-compatible server. Apache Tomcat comes with a WebDAV servlet that provides WebDAV access to files on the file system. Apache Slide ( http://jakarta.apache.org/slide ) is an open source contentmanagement system that has a WebDAV server and a command- line WebDAV client. Slide also has a WebDAV client library for Java, which we will use to build a WebDAV client portlet. WebDAV is an extension of the HTTP 1.1 protocol, so it is relatively easy to implement. WebDAV Methods If you are already familiar with the GET and POST HTTP methods, the WebDAV methods will look very similar. WebDAV adds many new methods beyond GET, 2840ch15.qxd 7/13/04 12:44 PM Page 368 Download at Boykma.Com [...]... 298 content managementsystems (CMSs), 10, 359–380 integration with, 361 Java Content Repository API and, 361, 362–368 Lucene search engine and, 279–280 overview of, 359–360 problems with portals and, 361–362 WebDAV protocol and, 361, 368–380 382 content markup types, 105–107 deployment descriptor for, 107 requesting information on, 106 setting on the render response, 106–107 content repository, 364 content. .. final String CHANGE_COLL = "CHANGE_COLLECTION"; public static final String DISPLAY _CONTENT = "DISPLAY _CONTENT" ; public static final String DISPLAY_PARENT = "DISPLAY_PARENT"; public static final String PATH = "PATH"; WebDAVHelper helper; 372 Download at Boykma.Com 2840ch15.qxd 7/13/04 12:44 PM Page 373 ContentManagementSystems public void init(PortletConfig config) throws PortletException { super.init(config);... standards for integrating content managementsystems into portlets: Java Content Repository API and WebDAV Although the JCR API is a new standard, we expect that CMS vendors will adopt the standard quickly Many client and server applications already use WebDAV We discussed the Apache Slide WebDAV client library, and then used the library in a CMS portlet Our portlet displays the content available through... The list() method returns a String array of pathnames to the children The listBasic() method returns the child resources’ path names, content 370 Download at Boykma.Com 2840ch15.qxd 7/13/04 12:44 PM Page 371 Content ManagementSystems length, either collection or a content type, and the last modified date This information is stored in an array for each resource, and then each array is stored in a Vector... page, 150–151 editPage.jsp page, 147 EJBDoclet task, 174 Element Construction Set (ECS), 253 e-mail security, 209–210 encodeURL() method, 85–86 encoding, WSRP markup, 299 enterprise contentmanagement See content managementsystems enterprise information system (EIS), 339 Enterprise JavaBeans (EJB) classes, 172 Enumeration, 74, 94, 112, 186, 202 error.jsp file, 327 errors html.xsl file, 247 initialization,... name="COMMAND" value="DISPLAY _CONTENT" /> "> 378 Download at Boykma.Com 2840ch15.qxd 7/13/04 12:44 PM Page 379 Content ManagementSystems . Boykma.Com Content Management Systems 361 Integration with a Content Management System Most portal deployments require integration with at least one content management. Overview of Content Management Systems Content management is a broad field that encompasses a wide range of software applications. Document management,